diff --git a/doc/src/Build_settings.rst b/doc/src/Build_settings.rst index d9bd2cbc95c..f057d218250 100644 --- a/doc/src/Build_settings.rst +++ b/doc/src/Build_settings.rst @@ -459,11 +459,21 @@ supports the "popen" function in the standard runtime library. Read or write compressed files ----------------------------------------- +.. versionchanged:: TBD + + Added support for ``brotli`` and ``7-zip`` + If this option is enabled, large files can be read or written with compression by ``gzip`` or similar tools by several LAMMPS commands, -including :doc:`read_data `, :doc:`rerun `, and -:doc:`dump `. Supported compression tools and algorithms are currently -``gzip``, ``bzip2``, ``zstd``, ``xz``, ``lz4``, and ``lzma`` (via xz). +including :doc:`read_data `, :doc:`write_data `, +:doc:`rerun `, :doc:`dump `, and :doc:`write_dump +`. Supported compression tools and algorithms are currently +``gzip``, ``bzip2``, ``zstd``, ``xz``, ``lz4``, ``lzma`` (via xz), +``brotli``, and ``7-zip (via 7z)``. LAMMPS checks at runtime, which +compression commands are available and adjusts the check for supported +suffixes accordingly. The list of available compression formats and +suffixes is shown when running LAMMPS with the :doc:`-help or -h +command_line flag `. .. tabs:: diff --git a/doc/src/dump.rst b/doc/src/dump.rst index 2fd673ce48f..a17357f3377 100644 --- a/doc/src/dump.rst +++ b/doc/src/dump.rst @@ -210,7 +210,7 @@ support these options; see details on the :doc:`dump_modify ` doc page. As described below, the filename determines the kind of output: text -or binary or gzipped, one big file or one per timestep, one file for +or binary or compressed, one big file or one per timestep, one file for all the processors or multiple smaller files. .. note:: @@ -659,22 +659,24 @@ the binary file. The format of the binary file can be understood by looking at the :file:`tools/binary2txt.cpp` file. This option is only available for the *atom* and *custom* styles. -If the filename ends with ".gz", the dump file (or files, if "\*" or "%" -is also used) is written in gzipped format. A gzipped dump file will be -about :math:`3\times` smaller than the text version, but will also take -longer to write. This option is not available for the *dcd* and *xtc* -styles. +If LAMMPS has been compiled with the :doc:`corresponding setting +` and if the filename ends with ".gz" or some other +:ref:`supported compression format suffix `, the dump file (or +files, if "\*" or "%" is also used) is written in compressed format. A +compressed dump file will be about :math:`3\times` smaller than the text +version, but will also take longer to write. This option is not +available for the *dcd* and *xtc* styles. Note that styles that end with *gz* are identical in command syntax to -the corresponding styles without "gz", however, they generate -compressed files using the zlib library. Thus the filename suffix -".gz" is mandatory. This is an alternative approach to writing -compressed files via a pipe, as done by the regular dump styles, which -may be required on clusters where the interface to the high-speed -network disallows using the fork() library call (which is needed for a -pipe). For the remainder of this page, you should thus consider the -*atom* and *atom/gz* styles (etc.) to be inter-changeable, with the -exception of the required filename suffix. +the corresponding styles without "gz", however, they generate compressed +files using the zlib library. Thus the filename suffix ".gz" is +mandatory. This is an alternative approach to writing compressed files +via a pipe (see above), as done by the regular dump styles, which may be +required on HPC clusters where the interface to the high-speed network +disallows using the fork() library call (which is needed for a pipe). +For the remainder of this page, you should thus consider the *atom* and +*atom/gz* styles (etc.) to be inter-changeable, with the exception of +the required filename suffix. Similarly, styles that end with *zstd* are identical to the gz styles, but use the Zstd compression library instead and require a ".zst" @@ -1024,8 +1026,8 @@ to effectively specify multiple values. Restrictions """""""""""" -To write gzipped dump files, you must either compile LAMMPS with the --DLAMMPS_GZIP option or use the styles from the COMPRESS package. +To write compressed dump files, you must either compile LAMMPS with the +``-DLAMMPS_GZIP`` option or use the styles from the COMPRESS package. See the :doc:`Build settings ` page for details. While a dump command is active (i.e., has not been stopped by using diff --git a/doc/src/dump_image.rst b/doc/src/dump_image.rst index 04300330c78..ec769b5e8f5 100644 --- a/doc/src/dump_image.rst +++ b/doc/src/dump_image.rst @@ -216,17 +216,17 @@ Here are two sample images, rendered as :math:`1024\times 1024` JPEG files. Only atoms in the specified group are rendered in the image. The :doc:`dump_modify region and thresh ` commands can also -alter what atoms are included in the image. -The filename suffix determines whether a JPEG, PNG, or PPM file is -created with the *image* dump style. If the suffix is ".jpg" or -".jpeg", then a `JPEG format `_ file is created, if the -suffix is ".png", then a `PNG format `_ is created, else -a `PPM (aka NETPBM) format `_ file is created. -The JPEG and PNG files are binary; PPM has a text mode header followed -by binary data. JPEG images have lossy compression, PNG has lossless -compression, and PPM files are uncompressed but can be compressed with -gzip, if LAMMPS has been compiled with -DLAMMPS_GZIP and a ".gz" suffix -is used. +alter what atoms are included in the image. The filename suffix +determines whether a JPEG, PNG, or PPM file is created with the *image* +dump style. If the suffix is ".jpg" or ".jpeg", then a `JPEG format +`_ file is created, if the suffix is ".png", then a `PNG +format `_ is created, else a `PPM (aka NETPBM) format +`_ file is created. The JPEG and PNG files are binary; PPM +has a text mode header followed by binary data. JPEG images have lossy +compression, PNG has lossless compression, and PPM files are +uncompressed but can be compressed with a supported compression program, +if LAMMPS has been compiled with :ref:`compression support ` and a +supported suffix is used. .. _jpeg_format: https://jpeg.org/jpeg/ .. _png_format: https://en.wikipedia.org/wiki/Portable_Network_Graphics diff --git a/doc/src/dump_modify.rst b/doc/src/dump_modify.rst index ca4d08209b8..f1f0a4dbe90 100644 --- a/doc/src/dump_modify.rst +++ b/doc/src/dump_modify.rst @@ -150,7 +150,7 @@ various dump styles, including the :doc:`dump image ` and The *append* keyword applies to all dump styles except *cfg* and *xtc* and *dcd*\ . It also applies only to text output files, not to binary -or gzipped or image/movie files. If specified as *yes*, then dump +or compressed or image/movie files. If specified as *yes*, then dump snapshots are appended to the end of an existing dump file. If specified as *no*, then a new dump file will be created which will overwrite an existing file with the same name. @@ -170,7 +170,7 @@ dump file has been opened, this keyword has no further effect. The *buffer* keyword applies only to dump styles *atom*, *cfg*, *custom*, *local*, and *xyz*\ . It also applies only to text output -files, not to binary or gzipped files. If specified as *yes*, which +files, not to binary or compressed files. If specified as *yes*, which is the default, then each processor writes its output into an internal text buffer, which is then sent to the processor(s) which perform file writes, and written by those processors(s) as one large chunk of text. diff --git a/doc/src/fix_reaxff_bonds.rst b/doc/src/fix_reaxff_bonds.rst index 1194af617c3..748fadf23d4 100644 --- a/doc/src/fix_reaxff_bonds.rst +++ b/doc/src/fix_reaxff_bonds.rst @@ -52,9 +52,10 @@ The meaning of the column header abbreviations is as follows: * nlp = number of lone pairs * q = atomic charge -If the filename ends with ".gz", the output file is written in gzipped -format. A gzipped dump file will be about 3x smaller than the text -version, but will also take longer to write. +If the filename ends with ".gz" or some :ref:`other supported +compression format suffix `, the output file is written in +compressed format. A compressed output file can be significantly +smaller than the text version, but will also take longer to write. .. versionadded:: 2Apr2025 @@ -93,8 +94,9 @@ The fix reaxff/bonds command requires that the :doc:`pair_style reaxff is only enabled if LAMMPS was built with that package. See the :doc:`Build package ` page for more info. -To write gzipped bond files, you must compile LAMMPS with the --DLAMMPS_GZIP option. +To write compressed bond files, you must compile LAMMPS with the +``-DLAMMPS_GZIP`` option. See the :doc:`Build settings ` +doc page for details. Related commands """""""""""""""" diff --git a/doc/src/fix_reaxff_species.rst b/doc/src/fix_reaxff_species.rst index dd73289e761..42b93badefe 100644 --- a/doc/src/fix_reaxff_species.rst +++ b/doc/src/fix_reaxff_species.rst @@ -86,9 +86,10 @@ the first line. calculations, reneighboring only every 100 steps is already quite a low frequency. -If the filename ends with ".gz", the output file is written in gzipped -format. A gzipped dump file will be about 3x smaller than the text version, -but will also take longer to write. +If the filename ends with ".gz" or some :ref:`other supported +compression format suffix `, the output file is written in +compressed format. A compressed output file can be significantly +smaller than the text version, but will also take longer to write. .. versionadded:: 15Jun2023 @@ -296,7 +297,9 @@ The "fix reaxff/species" requires that :doc:`pair_style reaxff ` is This fix is part of the REAXFF package. It is only enabled if LAMMPS was built with that package. See the :doc:`Build package ` page for more info. -To write gzipped species files, you must compile LAMMPS with the -DLAMMPS_GZIP option. +To write compressed species files, you must compile LAMMPS with the +``-DLAMMPS_GZIP`` option. See the :doc:`Build settings ` +doc page for details. Related commands """""""""""""""" diff --git a/doc/src/fix_tmd.rst b/doc/src/fix_tmd.rst index 3ee7f2a5c8e..c12e2e58dd9 100644 --- a/doc/src/fix_tmd.rst +++ b/doc/src/fix_tmd.rst @@ -40,8 +40,9 @@ group and the target coordinates listed in file1. Thus a value of rho_final = 0.0 means move the atoms all the way to the final structure during the course of the run. -The target file1 can be ASCII text or a gzipped text file (detected by -a .gz suffix). The format of the target file1 is as follows: +The target file1 can be ASCII text or a compressed text file (detected +by a :ref:`supported compression format suffix `). The format of +the target file1 is as follows: .. parsed-literal:: @@ -120,8 +121,8 @@ which a SHAKE fix is applied. This is because LAMMPS assumes there are not multiple competing holonomic constraints applied to the same atoms. -To read gzipped target files, you must compile LAMMPS with the --DLAMMPS_GZIP option. See the :doc:`Build settings ` +To read compressed target files, you must compile LAMMPS with the +``-DLAMMPS_GZIP`` option. See the :doc:`Build settings ` doc page for details. Related commands diff --git a/doc/src/neb.rst b/doc/src/neb.rst index b626796b6bd..6dd5c88f203 100644 --- a/doc/src/neb.rst +++ b/doc/src/neb.rst @@ -278,11 +278,12 @@ larger than you would normally use for dynamics simulations. Each file read by the neb command containing atomic coordinates used to initialize one or more replicas must be formatted as follows. -The file can be ASCII text or a gzipped text file (detected by a .gz -suffix). The file can contain initial blank lines or comment lines -starting with "#" which are ignored. The first non-blank, non-comment -line should list N = the number of lines to follow. The N successive -lines contain the following information: +The file can be ASCII text or a compressed text file (detected by a +:ref:`supported compression format suffix `). The file can +contain initial blank lines or comment lines starting with "#" which are +ignored. The first non-blank, non-comment line should list N = the +number of lines to follow. The N successive lines contain the following +information: .. parsed-literal:: @@ -440,6 +441,10 @@ This command can only be used if LAMMPS was built with the REPLICA package. See the :doc:`Build package ` doc page for more info. +To read compressed files, you must compile LAMMPS with the +``-DLAMMPS_GZIP`` option. See the :doc:`Build settings +` doc page for details. + ---------- Related commands diff --git a/doc/src/read_data.rst b/doc/src/read_data.rst index b2c24d66c76..53de3a2a5c1 100644 --- a/doc/src/read_data.rst +++ b/doc/src/read_data.rst @@ -62,8 +62,9 @@ Description """"""""""" Read in a data file containing information LAMMPS needs to run a -simulation. The file can be ASCII text or a gzipped text file -(detected by a .gz suffix). +simulation. The file can be ASCII text or a compressed text file +(detected by its suffix) if LAMMPS has been compiled with support +for :ref:`compression commands `. This is one of 3 ways to specify the simulation box: see the :doc:`create_box ` and :doc:`read_restart ` @@ -1717,8 +1718,8 @@ Translational velocities can also be (re)set by the :doc:`velocity Restrictions """""""""""" -To read gzipped data files, you must compile LAMMPS with the --DLAMMPS_GZIP option. See the :doc:`Build settings ` +To read compressed data files, you must compile LAMMPS with the +``-DLAMMPS_GZIP`` option. See the :doc:`Build settings ` doc page for details. Label maps are currently not supported when using the KOKKOS package. diff --git a/doc/src/write_data.rst b/doc/src/write_data.rst index 937ab5ba028..aa7b6e3384f 100644 --- a/doc/src/write_data.rst +++ b/doc/src/write_data.rst @@ -31,16 +31,22 @@ Examples .. code-block:: LAMMPS write_data data.polymer + write_data data.polymer.gz write_data data.* write_data data.solid triclinic/general Description """"""""""" -Write a data file in text format of the current state of the -simulation. Data files can be read by the :doc:`read data ` -command to begin a simulation. The :doc:`read_data ` command -also describes their format. +Write a data file in text format of the current state of the simulation. +Data files can be read by the :doc:`read data ` command to +begin a simulation. + +.. versionadded:: TBD + +The file may also be a compressed text file (detected by its suffix) if +LAMMPS has been compiled with support for :ref:`compression commands +` and the corresponding compression program is available. Similar to :doc:`dump ` files, the data filename can contain a "\*" wild-card character. The "\*" is replaced with the current timestep @@ -183,6 +189,10 @@ before the data file is written. This means that your system must be ready to perform a simulation before using this command (force fields setup, atom masses initialized, etc). +To write compressed data files, you must compile LAMMPS with the +``-DLAMMPS_GZIP`` option. See the :doc:`Build settings +` doc page for details. + Related commands """""""""""""""" diff --git a/src/platform.cpp b/src/platform.cpp index bedd716918f..e57c8fa4e01 100644 --- a/src/platform.cpp +++ b/src/platform.cpp @@ -83,7 +83,7 @@ namespace { /// Struct for listing on-the-fly compression/decompression commands struct compress_info { /// identifier for the different compression algorithms - enum styles { NONE, GZIP, BZIP2, ZSTD, XZ, LZMA, LZ4 }; + enum styles { NONE, GZIP, BZIP2, ZSTD, XZ, LZMA, LZ4, BROTLI, SEVENZIP }; const std::string extension; ///< filename extension for the current algorithm const std::string command; ///< command to perform compression or decompression const std::string compressflags; ///< flags to append to compress from stdin to stdout @@ -100,6 +100,8 @@ const std::vector compress_styles = { {"xz", "xz", " > ", " -cdf ", compress_info::XZ}, {"lzma", "xz", " --format=lzma > ", " --format=lzma -cdf ", compress_info::LZMA}, {"lz4", "lz4", " > ", " -cdf ", compress_info::LZ4}, + {"br", "brotli", " > ", " -cdf ", compress_info::BROTLI}, + {"7z", "7z", " a -bb0 -si ", " x -so ", compress_info::SEVENZIP}, }; // clang-format on @@ -122,7 +124,7 @@ const compress_info &find_compress_type(const std::string &file) // set reference time stamp during executable/library init. // should provide better resolution than using epoch, if the system clock supports it. auto initial_time = std::chrono::steady_clock::now(); -} +} // namespace using namespace LAMMPS_NS; // get CPU time @@ -1053,6 +1055,18 @@ FILE *platform::compressed_read(const std::string &file) const auto &compress = find_compress_type(file); if (compress.style == ::compress_info::NONE) return nullptr; + // make certain the file exists and is readable + + std::error_code ec; + if (!std::filesystem::exists(file, ec)) { + errno = ENOENT; + return nullptr; + } + if (!file_is_readable(file)) { + errno = EPERM; + return nullptr; + } + if (find_exe_path(compress.command).size()) // put quotes around file name so that they may contain blanks fp = popen((compress.command + compress.uncompressflags + "\"" + file + "\""), "r"); @@ -1073,9 +1087,14 @@ FILE *platform::compressed_write(const std::string &file) if (compress.style == ::compress_info::NONE) return nullptr; if (!file_is_writable(file)) return nullptr; - if (find_exe_path(compress.command).size()) - // put quotes around file name so that they may contain blanks + if (find_exe_path(compress.command).size()) { + // explicitly delete existing files for compatibility with commands that cannot write to stdout + // and thus we don't use redirection to a file, but provide the file name as argument directly. + // this can result in failure or inclusion of the same filename multiple times with out deleting + if (file_is_readable(file)) unlink(file); + // put quotes around file name for shell command so that they may contain blanks fp = popen((compress.command + compress.compressflags + "\"" + file + "\""), "w"); + } #endif return fp; } diff --git a/src/platform.h b/src/platform.h index ffbc35262fd..b3fa61de794 100644 --- a/src/platform.h +++ b/src/platform.h @@ -382,7 +382,7 @@ namespace platform { /*! Check if a file name ends in a known extension for a compressed file format * - * Currently supported file extensions are: .gz, .bz2, .zst, .xz, .lzma, lz4 + * Currently supported file extensions are: .gz, .bz2, .zst, .xz, .lzma, .lz4, .br, and .7z * * \param file name of the file to check * \return true if the file has a known extension, otherwise false */ diff --git a/src/version.h b/src/version.h index e27241b11a3..c7397ce379b 100644 --- a/src/version.h +++ b/src/version.h @@ -1 +1,2 @@ #define LAMMPS_VERSION "10 Dec 2025" +#define LAMMPS_UPDATE "Development" diff --git a/src/write_data.cpp b/src/write_data.cpp index 1c92462de16..98b2fad6572 100644 --- a/src/write_data.cpp +++ b/src/write_data.cpp @@ -203,7 +203,11 @@ void WriteData::write(const std::string &file) // open data file if (comm->me == 0) { - fp = fopen(file.c_str(),"w"); + if (platform::has_compress_extension(file)) { + fp = platform::compressed_write(file); + } else { + fp = fopen(file.c_str(), "w"); + } if (fp == nullptr) error->one(FLERR,"Cannot open data file {}: {}", file, utils::getsyserror()); }