Skip to content
Merged
16 changes: 13 additions & 3 deletions doc/src/Build_settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -459,11 +459,21 @@ supports the "popen" function in the standard runtime library.
Read or write compressed files
-----------------------------------------

.. versionchanged:: TBD

Added support for ``brotli`` and ``7-zip``

If this option is enabled, large files can be read or written with
compression by ``gzip`` or similar tools by several LAMMPS commands,
including :doc:`read_data <read_data>`, :doc:`rerun <rerun>`, and
:doc:`dump <dump>`. Supported compression tools and algorithms are currently
``gzip``, ``bzip2``, ``zstd``, ``xz``, ``lz4``, and ``lzma`` (via xz).
including :doc:`read_data <read_data>`, :doc:`write_data <write_data>`,
:doc:`rerun <rerun>`, :doc:`dump <dump>`, and :doc:`write_dump
<write_dump>`. Supported compression tools and algorithms are currently
``gzip``, ``bzip2``, ``zstd``, ``xz``, ``lz4``, ``lzma`` (via xz),
``brotli``, and ``7-zip (via 7z)``. LAMMPS checks at runtime, which
compression commands are available and adjusts the check for supported
suffixes accordingly. The list of available compression formats and
suffixes is shown when running LAMMPS with the :doc:`-help or -h
command_line flag <Run_options>`.

.. tabs::

Expand Down
36 changes: 19 additions & 17 deletions doc/src/dump.rst
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ support these options; see details on the :doc:`dump_modify
<dump_modify>` doc page.

As described below, the filename determines the kind of output: text
or binary or gzipped, one big file or one per timestep, one file for
or binary or compressed, one big file or one per timestep, one file for
all the processors or multiple smaller files.

.. note::
Expand Down Expand Up @@ -659,22 +659,24 @@ the binary file. The format of the binary file can be understood by
looking at the :file:`tools/binary2txt.cpp` file. This option is only
available for the *atom* and *custom* styles.

If the filename ends with ".gz", the dump file (or files, if "\*" or "%"
is also used) is written in gzipped format. A gzipped dump file will be
about :math:`3\times` smaller than the text version, but will also take
longer to write. This option is not available for the *dcd* and *xtc*
styles.
If LAMMPS has been compiled with the :doc:`corresponding setting
<Build_settings>` and if the filename ends with ".gz" or some other
:ref:`supported compression format suffix <gzip>`, the dump file (or
files, if "\*" or "%" is also used) is written in compressed format. A
compressed dump file will be about :math:`3\times` smaller than the text
version, but will also take longer to write. This option is not
available for the *dcd* and *xtc* styles.

Note that styles that end with *gz* are identical in command syntax to
the corresponding styles without "gz", however, they generate
compressed files using the zlib library. Thus the filename suffix
".gz" is mandatory. This is an alternative approach to writing
compressed files via a pipe, as done by the regular dump styles, which
may be required on clusters where the interface to the high-speed
network disallows using the fork() library call (which is needed for a
pipe). For the remainder of this page, you should thus consider the
*atom* and *atom/gz* styles (etc.) to be inter-changeable, with the
exception of the required filename suffix.
the corresponding styles without "gz", however, they generate compressed
files using the zlib library. Thus the filename suffix ".gz" is
mandatory. This is an alternative approach to writing compressed files
via a pipe (see above), as done by the regular dump styles, which may be
required on HPC clusters where the interface to the high-speed network
disallows using the fork() library call (which is needed for a pipe).
For the remainder of this page, you should thus consider the *atom* and
*atom/gz* styles (etc.) to be inter-changeable, with the exception of
the required filename suffix.

Similarly, styles that end with *zstd* are identical to the gz styles,
but use the Zstd compression library instead and require a ".zst"
Expand Down Expand Up @@ -1024,8 +1026,8 @@ to effectively specify multiple values.
Restrictions
""""""""""""

To write gzipped dump files, you must either compile LAMMPS with the
-DLAMMPS_GZIP option or use the styles from the COMPRESS package.
To write compressed dump files, you must either compile LAMMPS with the
``-DLAMMPS_GZIP`` option or use the styles from the COMPRESS package.
See the :doc:`Build settings <Build_settings>` page for details.

While a dump command is active (i.e., has not been stopped by using
Expand Down
22 changes: 11 additions & 11 deletions doc/src/dump_image.rst
Original file line number Diff line number Diff line change
Expand Up @@ -216,17 +216,17 @@ Here are two sample images, rendered as :math:`1024\times 1024` JPEG files.

Only atoms in the specified group are rendered in the image. The
:doc:`dump_modify region and thresh <dump_modify>` commands can also
alter what atoms are included in the image.
The filename suffix determines whether a JPEG, PNG, or PPM file is
created with the *image* dump style. If the suffix is ".jpg" or
".jpeg", then a `JPEG format <jpeg_format_>`_ file is created, if the
suffix is ".png", then a `PNG format <png_format_>`_ is created, else
a `PPM (aka NETPBM) format <ppm_format_>`_ file is created.
The JPEG and PNG files are binary; PPM has a text mode header followed
by binary data. JPEG images have lossy compression, PNG has lossless
compression, and PPM files are uncompressed but can be compressed with
gzip, if LAMMPS has been compiled with -DLAMMPS_GZIP and a ".gz" suffix
is used.
alter what atoms are included in the image. The filename suffix
determines whether a JPEG, PNG, or PPM file is created with the *image*
dump style. If the suffix is ".jpg" or ".jpeg", then a `JPEG format
<jpeg_format_>`_ file is created, if the suffix is ".png", then a `PNG
format <png_format_>`_ is created, else a `PPM (aka NETPBM) format
<ppm_format_>`_ file is created. The JPEG and PNG files are binary; PPM
has a text mode header followed by binary data. JPEG images have lossy
compression, PNG has lossless compression, and PPM files are
uncompressed but can be compressed with a supported compression program,
if LAMMPS has been compiled with :ref:`compression support <gzip>` and a
supported suffix is used.

.. _jpeg_format: https://jpeg.org/jpeg/
.. _png_format: https://en.wikipedia.org/wiki/Portable_Network_Graphics
Expand Down
4 changes: 2 additions & 2 deletions doc/src/dump_modify.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ various dump styles, including the :doc:`dump image <dump_image>` and

The *append* keyword applies to all dump styles except *cfg* and *xtc*
and *dcd*\ . It also applies only to text output files, not to binary
or gzipped or image/movie files. If specified as *yes*, then dump
or compressed or image/movie files. If specified as *yes*, then dump
snapshots are appended to the end of an existing dump file. If
specified as *no*, then a new dump file will be created which will
overwrite an existing file with the same name.
Expand All @@ -170,7 +170,7 @@ dump file has been opened, this keyword has no further effect.

The *buffer* keyword applies only to dump styles *atom*, *cfg*,
*custom*, *local*, and *xyz*\ . It also applies only to text output
files, not to binary or gzipped files. If specified as *yes*, which
files, not to binary or compressed files. If specified as *yes*, which
is the default, then each processor writes its output into an internal
text buffer, which is then sent to the processor(s) which perform file
writes, and written by those processors(s) as one large chunk of text.
Expand Down
12 changes: 7 additions & 5 deletions doc/src/fix_reaxff_bonds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,10 @@ The meaning of the column header abbreviations is as follows:
* nlp = number of lone pairs
* q = atomic charge

If the filename ends with ".gz", the output file is written in gzipped
format. A gzipped dump file will be about 3x smaller than the text
version, but will also take longer to write.
If the filename ends with ".gz" or some :ref:`other supported
compression format suffix <gzip>`, the output file is written in
compressed format. A compressed output file can be significantly
smaller than the text version, but will also take longer to write.

.. versionadded:: 2Apr2025

Expand Down Expand Up @@ -93,8 +94,9 @@ The fix reaxff/bonds command requires that the :doc:`pair_style reaxff
is only enabled if LAMMPS was built with that package. See the
:doc:`Build package <Build_package>` page for more info.

To write gzipped bond files, you must compile LAMMPS with the
-DLAMMPS_GZIP option.
To write compressed bond files, you must compile LAMMPS with the
``-DLAMMPS_GZIP`` option. See the :doc:`Build settings <Build_settings>`
doc page for details.

Related commands
""""""""""""""""
Expand Down
11 changes: 7 additions & 4 deletions doc/src/fix_reaxff_species.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,10 @@ the first line.
calculations, reneighboring only every 100 steps is already quite a
low frequency.

If the filename ends with ".gz", the output file is written in gzipped
format. A gzipped dump file will be about 3x smaller than the text version,
but will also take longer to write.
If the filename ends with ".gz" or some :ref:`other supported
compression format suffix <gzip>`, the output file is written in
compressed format. A compressed output file can be significantly
smaller than the text version, but will also take longer to write.

.. versionadded:: 15Jun2023

Expand Down Expand Up @@ -296,7 +297,9 @@ The "fix reaxff/species" requires that :doc:`pair_style reaxff <pair_reaxff>` is
This fix is part of the REAXFF package. It is only enabled if LAMMPS was built with that
package. See the :doc:`Build package <Build_package>` page for more info.

To write gzipped species files, you must compile LAMMPS with the -DLAMMPS_GZIP option.
To write compressed species files, you must compile LAMMPS with the
``-DLAMMPS_GZIP`` option. See the :doc:`Build settings <Build_settings>`
doc page for details.

Related commands
""""""""""""""""
Expand Down
9 changes: 5 additions & 4 deletions doc/src/fix_tmd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ group and the target coordinates listed in file1. Thus a value of
rho_final = 0.0 means move the atoms all the way to the final
structure during the course of the run.

The target file1 can be ASCII text or a gzipped text file (detected by
a .gz suffix). The format of the target file1 is as follows:
The target file1 can be ASCII text or a compressed text file (detected
by a :ref:`supported compression format suffix <gzip>`). The format of
the target file1 is as follows:

.. parsed-literal::

Expand Down Expand Up @@ -120,8 +121,8 @@ which a SHAKE fix is applied. This is because LAMMPS assumes there
are not multiple competing holonomic constraints applied to the same
atoms.

To read gzipped target files, you must compile LAMMPS with the
-DLAMMPS_GZIP option. See the :doc:`Build settings <Build_settings>`
To read compressed target files, you must compile LAMMPS with the
``-DLAMMPS_GZIP`` option. See the :doc:`Build settings <Build_settings>`
doc page for details.

Related commands
Expand Down
15 changes: 10 additions & 5 deletions doc/src/neb.rst
Original file line number Diff line number Diff line change
Expand Up @@ -278,11 +278,12 @@ larger than you would normally use for dynamics simulations.
Each file read by the neb command containing atomic coordinates used
to initialize one or more replicas must be formatted as follows.

The file can be ASCII text or a gzipped text file (detected by a .gz
suffix). The file can contain initial blank lines or comment lines
starting with "#" which are ignored. The first non-blank, non-comment
line should list N = the number of lines to follow. The N successive
lines contain the following information:
The file can be ASCII text or a compressed text file (detected by a
:ref:`supported compression format suffix <gzip>`). The file can
contain initial blank lines or comment lines starting with "#" which are
ignored. The first non-blank, non-comment line should list N = the
number of lines to follow. The N successive lines contain the following
information:

.. parsed-literal::

Expand Down Expand Up @@ -440,6 +441,10 @@ This command can only be used if LAMMPS was built with the REPLICA
package. See the :doc:`Build package <Build_package>` doc
page for more info.

To read compressed files, you must compile LAMMPS with the
``-DLAMMPS_GZIP`` option. See the :doc:`Build settings
<Build_settings>` doc page for details.

----------

Related commands
Expand Down
9 changes: 5 additions & 4 deletions doc/src/read_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,9 @@ Description
"""""""""""

Read in a data file containing information LAMMPS needs to run a
simulation. The file can be ASCII text or a gzipped text file
(detected by a .gz suffix).
simulation. The file can be ASCII text or a compressed text file
(detected by its suffix) if LAMMPS has been compiled with support
for :ref:`compression commands <gzip>`.

This is one of 3 ways to specify the simulation box: see the
:doc:`create_box <create_box>` and :doc:`read_restart <read_restart>`
Expand Down Expand Up @@ -1717,8 +1718,8 @@ Translational velocities can also be (re)set by the :doc:`velocity
Restrictions
""""""""""""

To read gzipped data files, you must compile LAMMPS with the
-DLAMMPS_GZIP option. See the :doc:`Build settings <Build_settings>`
To read compressed data files, you must compile LAMMPS with the
``-DLAMMPS_GZIP`` option. See the :doc:`Build settings <Build_settings>`
doc page for details.

Label maps are currently not supported when using the KOKKOS package.
Expand Down
18 changes: 14 additions & 4 deletions doc/src/write_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,22 @@ Examples
.. code-block:: LAMMPS
write_data data.polymer
write_data data.polymer.gz
write_data data.*
write_data data.solid triclinic/general
Description
"""""""""""

Write a data file in text format of the current state of the
simulation. Data files can be read by the :doc:`read data <read_data>`
command to begin a simulation. The :doc:`read_data <read_data>` command
also describes their format.
Write a data file in text format of the current state of the simulation.
Data files can be read by the :doc:`read data <read_data>` command to
begin a simulation.

.. versionadded:: TBD

The file may also be a compressed text file (detected by its suffix) if
LAMMPS has been compiled with support for :ref:`compression commands
<gzip>` and the corresponding compression program is available.

Similar to :doc:`dump <dump>` files, the data filename can contain a "\*"
wild-card character. The "\*" is replaced with the current timestep
Expand Down Expand Up @@ -183,6 +189,10 @@ before the data file is written. This means that your system must be
ready to perform a simulation before using this command (force fields
setup, atom masses initialized, etc).

To write compressed data files, you must compile LAMMPS with the
``-DLAMMPS_GZIP`` option. See the :doc:`Build settings
<Build_settings>` doc page for details.

Related commands
""""""""""""""""

Expand Down
27 changes: 23 additions & 4 deletions src/platform.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ namespace {
/// Struct for listing on-the-fly compression/decompression commands
struct compress_info {
/// identifier for the different compression algorithms
enum styles { NONE, GZIP, BZIP2, ZSTD, XZ, LZMA, LZ4 };
enum styles { NONE, GZIP, BZIP2, ZSTD, XZ, LZMA, LZ4, BROTLI, SEVENZIP };
const std::string extension; ///< filename extension for the current algorithm
const std::string command; ///< command to perform compression or decompression
const std::string compressflags; ///< flags to append to compress from stdin to stdout
Expand All @@ -100,6 +100,8 @@ const std::vector<compress_info> compress_styles = {
{"xz", "xz", " > ", " -cdf ", compress_info::XZ},
{"lzma", "xz", " --format=lzma > ", " --format=lzma -cdf ", compress_info::LZMA},
{"lz4", "lz4", " > ", " -cdf ", compress_info::LZ4},
{"br", "brotli", " > ", " -cdf ", compress_info::BROTLI},
{"7z", "7z", " a -bb0 -si ", " x -so ", compress_info::SEVENZIP},
};
// clang-format on

Expand All @@ -122,7 +124,7 @@ const compress_info &find_compress_type(const std::string &file)
// set reference time stamp during executable/library init.
// should provide better resolution than using epoch, if the system clock supports it.
auto initial_time = std::chrono::steady_clock::now();
}
} // namespace
using namespace LAMMPS_NS;

// get CPU time
Expand Down Expand Up @@ -1053,6 +1055,18 @@ FILE *platform::compressed_read(const std::string &file)
const auto &compress = find_compress_type(file);
if (compress.style == ::compress_info::NONE) return nullptr;

// make certain the file exists and is readable

std::error_code ec;
if (!std::filesystem::exists(file, ec)) {
errno = ENOENT;
return nullptr;
}
if (!file_is_readable(file)) {
errno = EPERM;
return nullptr;
}

if (find_exe_path(compress.command).size())
// put quotes around file name so that they may contain blanks
fp = popen((compress.command + compress.uncompressflags + "\"" + file + "\""), "r");
Expand All @@ -1073,9 +1087,14 @@ FILE *platform::compressed_write(const std::string &file)
if (compress.style == ::compress_info::NONE) return nullptr;
if (!file_is_writable(file)) return nullptr;

if (find_exe_path(compress.command).size())
// put quotes around file name so that they may contain blanks
if (find_exe_path(compress.command).size()) {
// explicitly delete existing files for compatibility with commands that cannot write to stdout
// and thus we don't use redirection to a file, but provide the file name as argument directly.
// this can result in failure or inclusion of the same filename multiple times with out deleting
if (file_is_readable(file)) unlink(file);
// put quotes around file name for shell command so that they may contain blanks
fp = popen((compress.command + compress.compressflags + "\"" + file + "\""), "w");
}
#endif
return fp;
}
Expand Down
2 changes: 1 addition & 1 deletion src/platform.h
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,7 @@ namespace platform {

/*! Check if a file name ends in a known extension for a compressed file format
*
* Currently supported file extensions are: .gz, .bz2, .zst, .xz, .lzma, lz4
* Currently supported file extensions are: .gz, .bz2, .zst, .xz, .lzma, .lz4, .br, and .7z
*
* \param file name of the file to check
* \return true if the file has a known extension, otherwise false */
Expand Down
1 change: 1 addition & 0 deletions src/version.h
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
#define LAMMPS_VERSION "10 Dec 2025"
#define LAMMPS_UPDATE "Development"
6 changes: 5 additions & 1 deletion src/write_data.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,11 @@ void WriteData::write(const std::string &file)
// open data file

if (comm->me == 0) {
fp = fopen(file.c_str(),"w");
if (platform::has_compress_extension(file)) {
fp = platform::compressed_write(file);
} else {
fp = fopen(file.c_str(), "w");
}
if (fp == nullptr)
error->one(FLERR,"Cannot open data file {}: {}", file, utils::getsyserror());
}
Expand Down