Skip to content

Conversation

@JustinSongXh
Copy link

Corresponding to issue #1119

What does this PR do?

Type of Change

  • Cherry-pick JSON log format implementation from PostgreSQL 15.
  • Adapt JSON log fields, to be consistent with CSV log fields.
  • Support both gp_log_format and log_destination JSON config.

Impact

Performance:
IO burden would increase if user want to write log both in CSV and JSON format.

User-facing changes:
Enable GUC argument log_destination defunct by Greenplum. Users can change log format either using log_destination or gp_log_format, which would be confusing.

Checklist


michaelpq and others added 7 commits May 22, 2025 11:37
The previous protocol expected a set of matching characters to check if
a message sent was the last one or not, that changed depending on the
destination wanted:
- 't' and 'f' tracked the last message of a log sent to stderr.
- 'T' and 'F' tracked the last message of a log sent to csvlog.

This could be extended with more characters when introducing new
destinations, but using a bitmask is much more elegant.  This commit
changes the protocol so as a bitmask is used in the header of a log
chunk message sent to the syslogger, with the following options
available for now:
- log_destination as stderr.
- log_destination as csvlog.
- if a message is the last chunk of a message.

Sehrope found this issue in a patch set to introduce JSON as an option
for log_destination, but his patch made the size of the protocol header
larger.  This commit keeps the same size as the original, and adapts the
protocol as wanted.

Thanks also to Andrew Dunstan and Greg Stark for the discussion.

Author: Michael Paquier, Sehrope Sarkuni
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
These are added in the existing tests of pg_ctl for log rotation, that
already tested stderr.  The same amount of coverage is added for csvlog:
- Checks for pg_current_logfile().
- Log rotation with expected file name.
- Log contents generated.

This test is refactored to minimize the amount of work required to add
tests for new log formats, easing some upcoming work.

Author: Michael Paquier, Sehrope Sarkuni
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
… case

send_message_to_server_log() would force a redirection of a log entry to
stderr in some cases for csvlog, like the syslogger not being available
yet.  If this happens, csvlog would fall back to stderr to log
some information rather than nothing.  The code was organized so as
stderr is done before csvlog, with csvlog checking that stderr did not
happen yet with a reversed condition.  With this code organization, it
could be possible to lose some messages if running Postgres as a service
on WIN32, as there is no usable stderr, and the handling of the
StringInfoData holding the message for stderr was rather confusing
because of that.

This commit moves the csvlog handling to be before stderr, as as we are
able to track down if it is necessary to log something to stderr.  The
reduces the handling of stderr to be in a single code path, adding a
fallback to event logs for a WIN32 service.  This also simplifies the
way we handle the StringInfoData for stderr, making easier the
integration of new file-based log destinations.  I got to play with
services and event logs on Windows while checking this change.

Reviewed-by: Chris Bandy
Discussion: https://postgr.es/m/[email protected]
This refactors the following routines and facilities coming from
elog.c, to ease their use across multiple log destinations:
- Start timestamp, including its reset, to store when a process has been
started.
- The log timestamp, associated to an entry (the same timestamp is used
when logging across multiple destinations).
- Routine deciding if a query can be logged or not.
- The backend type names, depending on the process that logs any
information (postmaster, bgworker name or just GetBackendTypeDesc() with
a regular backend).
- Write of logs using the logging piped protocol, with the log collector
enabled.
- Error severity converted to a string.

These refactored routines will be used for some follow-up changes
to move all the csvlog logic into its own file and to potentially add
JSON as log destination, reducing the overall size of elog.c as the end
result.

Author: Michael Paquier, Sehrope Sarkuni
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
The recent refactoring done in ac7c807 makes this move possible and
simple, as this just moves some code around.  This reduces the size of
elog.c by 7%.

Author: Michael Paquier, Sehrope Sarkuni
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com

simply moves the routines related to csvlog into their own file
"jsonlog" is a new value that can be added to log_destination to provide
logs in the JSON format, with its output written to a file, making it
the third type of destination of this kind, after "stderr" and
"csvlog".  The format is convenient to feed logs to other applications.
There is also a plugin external to core that provided this feature using
the hook in elog.c, but this had to overwrite the output of "stderr" to
work, so being able to do both at the same time was not possible.  The
files generated by this log format are suffixed with ".json", and use
the same rotation policies as the other two formats depending on the
backend configuration.

This takes advantage of the refactoring work done previously in ac7c807,
bed6ed3, 8b76f89 and 2d77d83 for the backend parts, and 72b76f7 for the
TAP tests, making the addition of any new file-based format rather
straight-forward.

The documentation is updated to list all the keys and the values that
can exist in this new format.  pg_current_logfile() also required a
refresh for the new option.

Author: Sehrope Sarkuni, Michael Paquier
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
1. Support both gp_log_format and log_destination json config.
2. Reconstruct json log implementation,
   to be consistent with write_syslogger_in_csv logic.
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @JustinSongXh welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!

@tuhaihe
Copy link
Member

tuhaihe commented May 22, 2025

Hey @JustinSongXh, There is a proposal related to upgrading the PG14 to PG16. Perhaps your work can be a part of this proposal: #1095.

@JustinSongXh JustinSongXh changed the title Cherry-pick jsonlog implementation from PostgreSQL 15 WIP: Cherry-pick jsonlog implementation from PostgreSQL 15 May 22, 2025
@JustinSongXh JustinSongXh marked this pull request as draft May 22, 2025 06:14
@leborchuk
Copy link
Contributor

In CBDB, we also have a set of functions to read CBDB log files using SQL queries.

They use a CSV parser to read external data https://github.com/apache/cloudberry/blob/main/gpcontrib/gp_toolkit/gp_toolkit--1.3.sql#L168.
If we change the log format, we will also need to change the procedure for reading logs.

@avamingli avamingli added the cherry-pick cherry-pick upstream commts label Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick cherry-pick upstream commts

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants