Skip to content

Conversation

@mrdrivingduck
Copy link
Member

No description provided.

tglsfdc and others added 30 commits August 13, 2025 11:59
It turns out that on some platforms (at least current macOS, NetBSD,
OpenBSD) semget(2) will return EINVAL if there is a pre-existing
semaphore set with the same key and too few semaphores.  Our code
expects EEXIST in that case and treats EINVAL as a hard failure,
resulting in failure during initdb or postmaster start.

POSIX does document EINVAL for too-few-semaphores-in-set, and is
silent on its priority relative to EEXIST, so this behavior arguably
conforms to spec.  Nonetheless it's quite problematic because EINVAL
is also documented to mean that nsems is greater than the system's
limit on the number of semaphores per set (SEMMSL).  If that is
where the problem lies, retrying would just become an infinite loop.

To resolve this contradiction, retry after EINVAL, but also install a
loop limit that will make us give up regardless of the specific errno
after trying 1000 different keys.  (1000 is a pretty arbitrary number,
but it seems like it should be sufficient.)  I like this better than
the previous infinite-looping behavior, since it will also keep us out
of trouble if (say) we get EACCES due to a system-level permissions
problem rather than anything to do with a specific semaphore set.

This problem has only been observed in the field in PG 17, which uses
a higher nsems value than other branches (cf. 38da053, 810a8b1).
That makes it possible to get the failure if a new v17 postmaster
has a key collision with an existing postmaster of another branch.
In principle though, we might see such a collision against a semaphore
set created by some other application, in which case all branches are
vulnerable on these platforms.  Hence, backpatch.

Reported-by: Gavin Panella <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CALL7chmzY3eXHA7zHnODUVGZLSvK3wYCSP0RmcDFHJY8f28Q3g@mail.gmail.com
Backpatch-through: 13
We do not want to trigger some tasks by default, to avoid using too many
compute credits. These tasks have to be manually triggered to be run. But
e.g. for cfbot we do have sufficient resources, so we always want to start
those tasks.

With this commit, an individual repository can be configured to trigger
them automatically using an environment variable defined under
"Repository Settings", for example:

REPO_CI_AUTOMATIC_TRIGGER_TASKS="mingw netbsd openbsd"

This will enable cfbot to turn them on by default when running tests for the
Commitfest app.

Backpatch this back to PG 15, even though PG 15 does not have any manually
triggered task. Keeping the CI infrastructure the same seems advantageous.

Author: Andres Freund <[email protected]>
Co-authored-by: Thomas Munro <[email protected]>
Co-authored-by: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Discussion: https://postgr.es/m/20240413021221.hg53rvqlvldqh57i%40awork3.anarazel.de
Backpatch-through: 16
Handle 'ci-os-only' occurrences in the .cirrus.star file instead of
.cirrus.tasks.yml file. Now, 'ci-os-only' occurrences are controlled
from one central place instead of dealing with them in each task.

Author: Andres Freund <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Discussion: https://postgr.es/m/20240413021221.hg53rvqlvldqh57i%40awork3.anarazel.de
Backpatch: 15-, where CI support was added
This seems to have been broken back in be0a666.

Reported-by: Hayato Kuroda (Fujitsu) <[email protected]>
Author: David Rowley <[email protected]>
Discussion: https://postgr.es/m/OSCPR01MB14966E11EEFB37D7857FCEDB7F535A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 14
Recent changes to src/tools/ci/README triggered warnings like

    src/tools/ci/README:88: leftover conflict marker

Raise conflict-marker-size in .gitattributes to avoid these.
Commit c5b7ba4 changed things so that the ri_RootResultRelInfo field
of this struct is set for both partitions and inheritance children and
used for tuple routing and transition capture (before that commit, it
was only set for partitions to route tuples into), but failed to update
these comments.

Author: Etsuro Fujita <[email protected]>
Reviewed-by: Dean Rasheed <[email protected]>
Discussion: https://postgr.es/m/CAPmGK14NF5CcdCmTZpxrvpvBiT0y4EqKikW1r_wAu1CEHeOmUA%40mail.gmail.com
Backpatch-through: 14
The DROP SUBSCRIPTION command performs several operations: it stops the
subscription workers, removes subscription-related entries from system
catalogs, and deletes the replication slot on the publisher server.
Previously, this command acquired an AccessExclusiveLock on
pg_subscription before initiating these steps.

However, while holding this lock, the command attempts to connect to the
publisher to remove the replication slot. In cases where the connection is
made to a newly created database on the same server as subscriber, the
cache-building process during connection tries to acquire an
AccessShareLock on pg_subscription, resulting in a self-deadlock.

To resolve this issue, we reduce the lock level on pg_subscription during
DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier,
the higher lock level was used to prevent the launcher from starting a new
worker during the drop operation, as a restarted worker could become
orphaned.

Now, instead of relying on a strict lock, we acquire an AccessShareLock on
the specific subscription being dropped and re-validate its existence
after acquiring the lock. If the subscription is no longer valid, the
worker exits gracefully. This approach avoids the deadlock while still
ensuring that orphan workers are not created.

Reported-by: Alexander Lakhin <[email protected]>
Author: Dilip Kumar <[email protected]>
Reviewed-by: vignesh C <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Backpatch-through: 13
Discussion: https://postgr.es/m/[email protected]
This commit adds CHECK_FOR_INTERRUPTS to loops iterating over shared
buffers in several pg_buffercache functions, allowing them to be
interrupted during long-running operations.

Backpatch to all supported versions. Add CHECK_FOR_INTERRUPTS to the
loop in pg_buffercache_pages() in all supported branches, and to
pg_buffercache_summary() and pg_buffercache_usage_counts() in version
16 and newer.

Author: SATYANARAYANA NARLAPURAM <[email protected]>
Discussion: https://postgr.es/m/CAHg+QDcejeLx7WunFT3DX6XKh1KshvGKa8F5au8xVhqVvvQPRw@mail.gmail.com
Backpatch-through: 13
Some replication slot manipulations (logical decoding via SQL,
advancing) were failing an assertion when releasing a slot in
single-user mode, because active_pid was not set in a ReplicationSlot
when its slot is acquired.

ReplicationSlotAcquire() has some logic to be able to work with the
single-user mode.  This commit sets ReplicationSlot->active_pid to
MyProcPid, to let the slot-related logic fall-through, considering the
single process as the one holding the slot.

Some TAP tests are added for various replication slot functions with the
single-user mode, while on it, for slot creation, drop, advancing, copy
and logical decoding with multiple slot types (temporary, physical vs
logical).  These tests are skipped on Windows, as direct calls of
postgres --single would fail on permission failures.  There is no
platform-specific behavior that needs to be checked, so living with this
restriction should be fine.  The CI is OK with that, now let's see what
the buildfarm tells.

Author: Hayato Kuroda <[email protected]>
Reviewed-by: Paul A. Jungwirth <[email protected]>
Reviewed-by: Mutaamba Maasha <[email protected]>
Discussion: https://postgr.es/m/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
The description of this GUC provides a list of the situations where
full-page writes are generated.  However, it is not completely exact,
mentioning only the cases where full_page_writes=on or base backups.  It
is possible to generate full-page writes in more situations than these
two, making the description confusing as it implies that no other cases
exist.

The description is slightly reworded to take into account that other
cases are possible, without mentioning them directly to minimize the
maintenance burden should FPWs be generated in more contexts in the
future.

Author: Jingtang Zhang <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/CAPsk3_CtAYa_fy4p6=x7qtoutrdKvg1kGk46D5fsE=sMt2546g@mail.gmail.com
Backpatch-through: 13
Temporary relations may share the same RelFileNumber with a permanent
relation, or other temporary relations associated with other sessions.

Being able to uniquely identify a temporary relation would require
RelidByRelfilenumber() to know about the proc number of the temporary
relation it wants to identify, something it is not designed for since
its introduction in f01d1ae.

There are currently three callers of RelidByRelfilenumber():
- autoprewarm.
- Logical decoding, reorder buffer.
- pg_filenode_relation(), that attempts to find a relation OID based on
a tablespace OID and a RelFileNumber.

This makes the situation problematic particularly for the first two
cases, leading to the possibility of random ERRORs due to
inconsistencies that temporary relations can create in the cache
maintained by RelidByRelfilenumber().  The third case should be less of
an issue, as I suspect that there are few direct callers of
pg_filenode_relation().

The window where the ERRORs are happen is very narrow, requiring an OID
wraparound to create a lookup conflict in RelidByRelfilenumber() with a
temporary table reusing the same OID as another relation already cached.
The problem is easier to reach in workloads with a high OID consumption
rate, especially with a higher number of temporary relations created.

We could get pg_filenode_relation() and RelidByRelfilenumber() to work
with temporary relations if provided the means to identify them with an
optional proc number given in input, but the years have also shown that
we do not have a use case for it, yet.  Note that this could not be
backpatched if pg_filenode_relation() needs changes.  It is simpler to
ignore temporary relations.

Reported-by: Shenhao Wang <[email protected]>
Author: Vignesh C <[email protected]>
Reviewed-By: Ashutosh Bapat <[email protected]>
Reviewed-By: Robert Haas <[email protected]>
Reviewed-By: Kyotaro Horiguchi <[email protected]>
Reviewed-By: Takamichi Osumi <[email protected]>
Reviewed-By: Michael Paquier <[email protected]>
Reviewed-By: Masahiko Sawada <[email protected]>
Reported-By: Shenhao Wang <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
Commit 0decd5e missed DO_DEFAULT_ACL,
leading to assertion failures, potential dump order instability, and
spurious schema diffs.  Back-patch to v13, like that commit.

Reported-by: Alexander Lakhin <[email protected]>
Author: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
v17 introduced the MAINTAIN ON TABLES privilege.  That changed the
applicable "baseacls" reaching buildACLCommands().  That yielded
spurious TestUpgradeXversion diffs.  Change to use a TYPES privilege.
Types have the same one privilege in all supported versions, so they
avoid the problem.  Per buildfarm.  Back-patch to v13, like that commit.

Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
The CHECK_FOR_INTERRUPTS call in gingetbitmap turns out to be
inadequate to prevent a long uninterruptible loop, because
we now know a case where looping occurs within scanGetItem.
While the next patch will fix the bug that caused that, it
seems foolish to assume that no similar patterns are possible.
Let's do the CFI within scanGetItem's retry loop, instead.
This demonstrably allows canceling out of the loop exhibited
in bug #19031.

Bug: #19031
Reported-by: Tim Wood <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
Commit 4b754d6 introduced the concept of an excludeOnly scan key,
which cannot select matching index entries but can reject
non-matching tuples, for example a tsquery such as '!term'.  There are
poorly-documented assumptions that such scan keys do not appear as the
first scan key.  ginNewScanKey did nothing to ensure that, however,
with the result that certain GIN index searches could go into an
infinite loop while apparently-equivalent queries with the clauses in
a different order were fine.

Fix by teaching ginNewScanKey to place all excludeOnly scan keys
after all not-excludeOnly ones.  So far as we know at present,
it might be sufficient to avoid the case where the very first
scan key is excludeOnly; but I'm not very convinced that there
aren't other dependencies on the ordering.

Bug: #19031
Reported-by: Tim Wood <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
I think the error message for a different condition was inadvertently
copied.

This problem seems to have been introduced by commit a4d75c8.

Author: Álvaro Herrera <[email protected]>
Reported-by: jian he <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Backpatch-through: 14
Discussion: https://postgr.es/m/CACJufxEZ48toGH0Em_6vdsT57Y3L8pLF=DZCQ_gCii6=C3MeXw@mail.gmail.com
It's possible that if the only live partition is concurrently dropped
and try_table_open() fails, that the bms_del_member() will pfree the
live_parts Bitmapset.  Since the bms_del_member() call does not assign
the result back to the live_parts local variable, the while loop could
segfault as that variable would still reference the pfree'd Bitmapset.

Backpatch to 15. 52f3de8 was backpatched to 14, but there's no
bms_del_member() there due to live_parts not yet existing in RelOptInfo in
that version.  Technically there's no bug in version 15 as
bms_del_member() didn't pfree when the set became empty prior to
00b4146 (from v16).  Applied to v15 anyway to keep the code similar and
to avoid the bad coding pattern.

Author: Bernd Reiß <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 15
PQtrace() was generating its output for non-printable characters without
casting the characters printed with unsigned char, leading to some extra
"\xffffff" generated in the output due to the fact that char may be
signed.

Oversights introduced by commit 198b371, so backpatch down to v14.

Author: Ran Benita <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 14
SubPlan nodes are typically built very early, before any RelOptInfos
have been constructed for the parent query level.  As a result, the
simple_rel_array in the parent root has not yet been initialized.
Currently, during cost estimation of a SubPlan's testexpr, we may call
examine_variable() to look up statistical data about the expressions.
This can lead to "no relation entry for relid" errors.

To fix, pass root as NULL to cost_qual_eval() in cost_subplan(), since
the root does not yet contain enough information to safely consult
statistics.

One exception is SubPlan nodes built for the initplans of MIN/MAX
aggregates from indexes.  In this case, having a NULL root is safe
because testexpr will be NULL.  Additionally, an initplan will by
definition not consult anything from the parent plan.

Backpatch to all supported branches.  Although the reported call path
that triggers this error is not reachable prior to v17, there's no
guarantee that other code paths -- especially in extensions -- could
not encounter the same issue when cost_qual_eval() is called with a
root that lacks a valid simple_rel_array.  The test case is not
included in pre-v17 branches though.

Bug: #19037
Reported-by: Alexander Lakhin <[email protected]>
Diagnosed-by: Tom Lane <[email protected]>
Author: Richard Guo <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
If an INSERT has an ON CONFLICT DO UPDATE clause, the executor must
check that the target relation supports UPDATE as well as INSERT. In
particular, it must check that the target relation has a REPLICA
IDENTITY if it publishes updates. Formerly, it was not doing this
check, which could lead to silently breaking replication.

Fix by adding such a check to CheckValidResultRel(), which requires
adding a new onConflictAction argument. In back-branches, preserve ABI
compatibility by introducing a wrapper function with the original
signature.

Author: Zhijie Hou <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Reviewed-by: Dean Rasheed <[email protected]>
Tested-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 13
When executing a MERGE, check that the target relation supports all
actions mentioned in the MERGE command. Specifically, check that it
has a REPLICA IDENTITY if it publishes updates or deletes and the
MERGE command contains update or delete actions. Failing to do this
can silently break replication.

Author: Zhijie Hou <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Reviewed-by: Dean Rasheed <[email protected]>
Tested-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 15
Per buildfarm member wrasse, void function cannot return a value.
This only affects v13-v17, where an ABI-compatible wrapper function
was added.

Backpatch-through: 13-17
When executing a MERGE UPDATE action, if there is more than one
concurrent update of the target row, the lock-and-retry code would
sometimes incorrectly identify the latest version of the target tuple,
leading to incorrect results.

This was caused by using the ctid field from the TM_FailureData
returned by table_tuple_lock() in a case where the result was TM_Ok,
which is unsafe because the TM_FailureData struct is not guaranteed to
be fully populated in that case. Instead, it should use the tupleid
passed to (and updated by) table_tuple_lock().

To reduce the chances of similar errors in the future, improve the
commentary for table_tuple_lock() and TM_FailureData to make it
clearer that table_tuple_lock() updates the tid passed to it, and most
fields of TM_FailureData should not be relied on in non-failure cases.
An exception to this is the "traversed" field, which is set in both
success and failure cases.

Reported-by: Dmitry <[email protected]>
Author: Yugo Nagata <[email protected]>
Reviewed-by: Dean Rasheed <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 15
A new pgstats entry is created as a two-step process:
- The entry is looked at in the shared hashtable of pgstats, and is
inserted if not found.
- When not found and inserted, its fields are then initialized.  This
part include a DSA chunk allocation for the stats data of the new entry.

As currently coded, if the DSA chunk allocation fails due to an
out-of-memory failure, an ERROR is generated, leaving in the pgstats
shared hashtable an inconsistent entry due to the first step, as the
entry has already been inserted in the hashtable.  These broken entries
can then be found by other backends, crashing them.

There are only two callers of pgstat_init_entry(), when loading the
pgstats file at startup and when creating a new pgstats entry.  This
commit changes pgstat_init_entry() so as we use dsa_allocate_extended()
with DSA_ALLOC_NO_OOM, making it return NULL on allocation failure
instead of failing.  This way, a backend failing an entry creation can
take appropriate cleanup actions in the shared hashtable before throwing
an error.  Currently, this means removing the entry from the shared
hashtable before throwing the error for the allocation failure.

Out-of-memory errors unlikely happen in the wild, and we do not bother
with back-patches when these are fixed, usually.  However, the problem
dealt with here is a degree worse as it breaks the shared memory state
of pgstats, impacting other processes that may look at an inconsistent
entry that a different process has failed to create.

Author: Mikhail Kot <[email protected]>
Discussion: https://postgr.es/m/CAAi9E7jELo5_-sBENftnc2E8XhW2PKZJWfTC3i2y-GMQd2bcqQ@mail.gmail.com
Backpatch-through: 15
If the hash functions used for hashing tuples leaked any memory,
we failed to clean that up, resulting in query-lifespan memory
leakage in queries using hashed subplans.  One way that could
happen is if the values being hashed require de-toasting, since
most of our hash functions don't trouble to clean up de-toasted
inputs.

Prior to commit bf6c614, this leakage was largely masked
because TupleHashTableMatch would reset hashtable->tempcxt
(via execTuplesMatch).  But it doesn't do that anymore, and
that's not really the right place for this anyway: doing it
there could reset the tempcxt many times per hash lookup,
or not at all.  Instead put reset calls into ExecHashSubPlan
and buildSubPlanHash.  Along the way to that, rearrange
ExecHashSubPlan so that there's just one place to call
MemoryContextReset instead of several.

This amounts to accepting the de-facto API spec that the caller
of the TupleHashTable routines is responsible for resetting the
tempcxt adequately often.  Although the other callers seem to
get this right, it was not documented anywhere, so add a comment
about it.

Bug: #19040
Reported-by: Haiyang Li <[email protected]>
Author: Haiyang Li <[email protected]>
Reviewed-by: Fei Changhong <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
hash_xlog.h included descriptions for the blocks used in WAL records
that were was not completely consistent with how the records are
generated, with one block missing for SQUEEZE_PAGE, and inconsistent
descriptions used for block 0 in VACUUM_ONE_PAGE and MOVE_PAGE_CONTENTS.

This information was incorrect since c11453c, cross-checking the
logic for the record generation.

Author: Kirill Reshke <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/CALdSSPj1j=a1d1hVA3oabRFz0hSU3KKrYtZPijw4UPUM7LY9zw@mail.gmail.com
Backpatch-through: 13
pg_event_trigger_dropped_objects() would report a column default
object with is_temporary = false, even if it belongs to a
temporary table.  This seems clearly wrong, so adjust it to
report the table's temp-ness.

While here, refactor EventTriggerSQLDropAddObject to make its
handling of namespace objects less messy and avoid duplication
of the schema-lookup code.  And add some explicit test coverage
of dropped-object reports for dependencies of temp tables.

Back-patch to v15.  The bug exists further back, but the
GetAttrDefaultColumnAddress function this patch depends on does not,
and it doesn't seem worth adjusting it to cope with the older code.

Author: Antoine Violin <[email protected]>
Co-authored-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CAFjUV9x3-hv0gihf+CtUc-1it0hh7Skp9iYFhMS7FJjtAeAptA@mail.gmail.com
Backpatch-through: 15
Commit a0b99fc caused pg_event_trigger_dropped_objects()
to not fill the object_name field for schemas, which it
should have; and caused it to fill the object_name field
for default values, which it should not have.

In addition, triggers and RLS policies really should behave
the same way as we're making column defaults do; that is,
they should have is_temporary = true if they belong to a
temporary table.

Fix those things, and upgrade event_trigger.sql's woefully
inadequate test coverage of these secondary output columns.

As before, back-patch only to v15.

Reported-by: Sergey Shinderuk <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 15
Commit e3ffc3e fixed the translation of character classes in
SIMILAR TO regular expressions.  Unfortunately the fix broke a corner
case: if there is an escape character right after the opening bracket
(for example in "[\q]"), a closing bracket right after the escape
sequence would not be seen as closing the character class.

There were two more oversights: a backslash or a nested opening bracket
right at the beginning of a character class should remove the special
meaning from any following caret or closing bracket.

This bug suggests that this code needs to be more readable, so also
rename the variables "charclass_depth" and "charclass_start" to
something more meaningful, rewrite an "if" cascade to be more
consistent, and improve the commentary.

Reported-by: Dominique Devienne <[email protected]>
Reported-by: Stephan Springl <[email protected]>
Author: Laurenz Albe <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CAFCRh-8NwJd0jq6P=R3qhHyqU7hw0BTor3W0SvUcii24et+zAw@mail.gmail.com
Backpatch-through: 13
LLVM-21 renamed llvm::GlobalValue::getGUID() to
getGUIDAssumingExternalLinkage(), so add a version guard.

Author: Holger Hoffstätte <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/d25e6e4a-d1b4-84d3-2f8a-6c45b975f53d%40applied-asynchrony.com
alvherre and others added 19 commits November 4, 2025 17:30
Accidental omission in commit aa2ba50.  There are too many lists of
these variables ...

Discussion: https://postgr.es/m/[email protected]
Backpatch commit 7bc9a8b to 13-17. The motivation for backpatching is that
we want to update CI to Debian Trixie. Trixie contains a newer mingw
installation, which would trigger the warning addressed by 7bc9a8b. The
risk of backpatching seems fairly low, given that it did not cause issues in
the branches the commit is already present.

While CI is not present in 13-14, it seems better to be consistent across
branches.

Author: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/o5yadhhmyjo53svzwvaocww6zkrp63i4f32cw3treuh46pxtza@hyqio5b2tkt6
Backpatch-through: 13
Debian Trixie CI images are generated now [1], so use them with the
following changes:

- detect_stack_use_after_return=0 option is added to the ASAN_OPTIONS
  because ASAN uses a "shadow stack" to track stack variable lifetimes
  and this confuses Postgres' stack depth check [2].

- Perl is updated to the newer version (perl5.40-i386-linux-gnu).

- LLVM-14 is no longer default installation, no need to force using
  LLVM-16.

- Switch MinGW CC/CXX to x86_64-w64-mingw32ucrt-* to fix build failure
  from missing _iswctype_l in mingw-w64 v12 headers.

[1] anarazel/pg-vm-images@35a144793f
[2] https://postgr.es/m/20240130212304.q66rquj5es4375ab%40awork3.anarazel.de

Author: Nazir Bilal Yavuz <[email protected]>
Discussion: https://postgr.es/m/CAN55FZ1_B1usTskAv+AYt1bA7abVd9YH6XrUUSbr-2Z0d5Wd8w@mail.gmail.com
Backpatch: 15-, where CI support was added
Commit a95e3d8 added ActiveSnapshot push+pop when processing
work-items (BRIN autosummarization), but forgot to handle the case of
a transaction failing during the run, which drops the snapshot untimely.
Fix by making the pop conditional on an element being actually there.

Author: Álvaro Herrera <[email protected]>
Backpatch-through: 13
Discussion: https://postgr.es/m/[email protected]
In 2a0faed, which added JIT compilation support for expressions, I
accidentally used sizeof(LLVMBasicBlockRef *) instead of
sizeof(LLVMBasicBlockRef) as part of computing the size of an allocation. That
turns out to have no real negative consequences due to LLVMBasicBlockRef being
a pointer itself (and thus having the same size). It still is wrong and
confusing, so fix it.

Reported by coverity.

Backpatch-through: 13
The test introduced by 17b2d5e verifies that a WAL receiver
survives across a timeline jump by searching the server logs for
termination messages.  However, it called restart() before the timeline
switch, which kills the WAL receiver and may log the exact message being
checked, hence failing the test.  As TAP tests reuse the same log file
across restarts, a rotate_logfile() is used before the restart so as the
log matching check is not impacted by log entries generated by a
previous shutdown.

Recent changes to file handle inheritance altered I/O timing enough to
make this fail consistently while testing another patch.

While on it, this adds an extra check based on a PID comparison.  This
test may lead to false positives as it could be possible that the WAL
receiver has processed a timeline jump before the initial PID is
grabbed, but it should be good enough in most cases.

Like 17b2d5e, backpatch down to v13.

Author: Bryan Green <[email protected]>
Co-authored-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
If any shell command fails, the whole script should fail.  To avoid
future omissions, add this even for single-command scripts that use su
with heredoc syntax, as they might be extended or copied-and-pasted.

Extracted from a larger patch that wanted to use #error during
compilation, leading to the diagnosis of this problem.

Reviewed-by: Tristan Partin <[email protected]> (earlier version)
Discussion: https://postgr.es/m/DDZP25P4VZ48.3LWMZBGA1K9RH%40partin.io
Backpatch-through: 15
postgres_fdw supports EvalPlanQual testing by using the infrastructure
provided by the core with the RecheckForeignScan callback routine (cf.
commits 5fc4c26 and 385f337), but there has been no test coverage
for that, except that recent commit 12609fb, which fixed an issue in
commit 385f337, added a test case to exercise only a code path added
by that commit to the core infrastructure.  So let's add test cases to
exercise other code paths as well at this time.

Like commit 12609fb, back-patch to all supported branches.

Reported-by: Masahiko Sawada <[email protected]>
Author: Etsuro Fujita <[email protected]>
Discussion: https://postgr.es/m/CAPmGK15%2B6H%3DkDA%3D-y3Y28OAPY7fbAdyMosVofZZ%2BNc769epVTQ%40mail.gmail.com
Backpatch-through: 13
Commit 27cc7cd removed the epqScanDone flag from the EState struct,
and instead added an equivalent flag named relsubs_done to the EPQState
struct; but it failed to update this comment.

Author: Etsuro Fujita <[email protected]>
Discussion: https://postgr.es/m/CAPmGK152zJ3fU5avDT5udfL0namrDeVfMTL3dxdOXw28SOrycg%40mail.gmail.com
Backpatch-through: 13
Stored generated columns are not yet computed when the filtering
happens, so we need to prohibit them to avoid incorrect behavior.

Co-authored-by: jian he <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxHb8YPQ095R_pYDr77W9XKNaXg5Rzy-WP525mkq+hRM3g@mail.gmail.com
XLogRecPtrIsInvalid() is inconsistent with the affirmative form of
macros used for other datatypes, and leads to awkward double negatives
in a few places.  This commit introduces XLogRecPtrIsValid(), which
allows code to be written more naturally.

This patch only adds the new macro.  XLogRecPtrIsInvalid() is left in
place, and all existing callers remain untouched.  This means all
supported branches can accept hypothetical bug fixes that use the new
macro, and at the same time any code that compiled with the original
formulation will continue to silently compile just fine.

Author: Bertrand Drouvot <[email protected]>
Backpatch-through: 13
Discussion: https://postgr.es/m/[email protected]
The following parameters can only be set at server start because
their context is PGC_POSTMASTER, but this information was missing
or incorrectly documented. This commit adds or corrects
that information for the following parameters:

* debug_io_direct
* dynamic_shared_memory_type
* event_source
* huge_pages
* io_max_combine_limit
* max_notify_queue_pages
* shared_memory_type
* track_commit_timestamp
* wal_decode_buffer_size

Backpatched to all supported branches.

Author: Karina Litskevich <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/CAHGQGwGfPzcin-_6XwPgVbWTOUFVZgHF5g9ROrwLUdCTfjy=0A@mail.gmail.com
Backpatch-through: 13
generic-gcc.h maps our read and write barriers to C11 acquire and
release fences using compiler builtins, for platforms where we don't
have our own hand-rolled assembler.  This is apparently enough for GCC,
but the C11 memory model is only defined in terms of atomic accesses,
and our barriers for non-atomic, non-volatile accesses were not always
respected under Clang's stricter interpretation of the standard.

This explains the occasional breakage observed on new RISC-V + Clang
animal greenfly in lock-free PgAioHandle manipulation code containing a
repeating pattern of loads and read barriers.  The problem can also be
observed in code generated for MIPS and LoongAarch, though we aren't
currently testing those with Clang, and on x86, though we use our own
assembler there.  The scariest aspect is that we use the generic version
on very common ARM systems, but it doesn't seem to reorder the relevant
code there (or we'd have debugged this long ago).

Fix by inserting an explicit compiler barrier.  It expands to an empty
assembler block declared to have memory side-effects, so registers are
flushed and reordering is prevented.  In those respects this is like the
architecture-specific assembler versions, but the compiler is still in
charge of generating the appropriate fence instruction.  Done for write
barriers on principle, though concrete problems have only been observed
with read barriers.

Reported-by: Alexander Lakhin <[email protected]>
Tested-by: Alexander Lakhin <[email protected]>
Discussion: https://postgr.es/m/d79691be-22bd-457d-9d90-18033b78c40a%40gmail.com
Backpatch-through: 13
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 5a9d74adc04e700403c23298de44bceb39339c3a
Several functions could overflow their size calculations, when presented
with very large inputs from remote and/or untrusted locations, and then
allocate buffers that were too small to hold the intended contents.

Switch from int to size_t where appropriate, and check for overflow
conditions when the inputs could have plausibly originated outside of
the libpq trust boundary. (Overflows from within the trust boundary are
still possible, but these will be fixed separately.) A version of
add_size() is ported from the backend to assist with code that performs
more complicated concatenation.

Reported-by: Aleksey Solovev (Positive Technologies)
Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Security: CVE-2025-12818
Backpatch-through: 13
This omission allowed table owners to create statistics in any
schema, potentially leading to unexpected naming conflicts.  For
ALTER TABLE commands that require re-creating statistics objects,
skip this check in case the user has since lost CREATE on the
schema.  The addition of a second parameter to CreateStatistics()
breaks ABI compatibility, but we are unaware of any impacted
third-party code.

Reported-by: Jelte Fennema-Nio <[email protected]>
Author: Jelte Fennema-Nio <[email protected]>
Co-authored-by: Nathan Bossart <[email protected]>
Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Security: CVE-2025-12817
Backpatch-through: 13
@mrdrivingduck mrdrivingduck self-assigned this Nov 19, 2025
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ mrdrivingduck
❌ tglsfdc
You have signed the CLA already but the status is still pending? Let us recheck it.

@mrdrivingduck mrdrivingduck force-pushed the merge_15_15 branch 3 times, most recently from b257438 to 3fa6fbb Compare November 19, 2025 03:15
@mrdrivingduck mrdrivingduck merged commit 3894805 into POLARDB_15_STABLE Nov 20, 2025
34 of 35 checks passed
@mrdrivingduck mrdrivingduck deleted the merge_15_15 branch November 20, 2025 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.