Skip to content

Conversation

@AXiX-official
Copy link

This PR implements default LZ4 compression for PCH/PCM.

There are a couple of points I'm not sure about

  • whether or not to make compression optional in the config
  • whether or not zstd would have a better compression ratio with similar overhead to lz4
  • I'm not sure if clang can output PCH to memory instead of disk, if it can I'll improve the current implementation to compress files from memory

related to #209

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @AXiX-official, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces default LZ4 compression for Precompiled Headers (PCH) and Precompiled Modules (PCM). The primary goal is to reduce the disk footprint of these large intermediate files, which can improve storage efficiency and potentially speed up build times by reducing I/O. The implementation includes the necessary build system updates, core compression/decompression utilities, and modifications to the compiler's handling of PCH/PCM files to seamlessly work with the compressed format.

Highlights

  • LZ4 Compression Integration: The LZ4 compression library has been integrated into the project's build system (CMake and xmake) to enable efficient compression of Precompiled Headers (PCH) and Precompiled Modules (PCM).
  • PCH/PCM Compression & Decompression: New utility functions have been introduced to compress PCH/PCM files using LZ4, storing them with a custom header containing the original file size. Corresponding decompression logic allows these files to be read back into memory.
  • In-Memory PCH/PCM Handling: The compilation process has been updated to detect and decompress LZ4-compressed PCH/PCM files on the fly. Decompressed content is then provided to Clang via an InMemoryFileSystem to allow it to be treated as if it were a file on disk.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR adds LZ4 compression for PCH/PCM files to reduce disk space usage. The implementation involves adding lz4 as a dependency, creating new compression/decompression utilities, and integrating them into the PCH/PCM build and consumption pipeline. The use of a virtual file system to feed decompressed in-memory PCHs to Clang is a clever approach.

My review focuses on improving error handling to prevent server crashes, fixing a logic bug in the PCH loading fallback, and suggesting performance improvements by using llvm::StringRef for paths. I've also provided feedback on the open questions from your PR description:

  • Configurable Compression: Making compression optional via configuration would be a great addition. Some users might prefer to disable it to save CPU cycles, especially on fast storage.
  • zstd vs lz4: zstd generally offers better compression ratios than lz4, though often at the cost of compression/decompression speed. It would be valuable to benchmark both for your use case with PCH files. You could consider supporting both and making the algorithm configurable.
  • In-memory PCH generation: Clang can indeed write PCH output to a memory buffer. You can achieve this by providing a custom llvm::raw_pwrite_stream to the clang::GeneratePCHAction. This would be a significant optimization, as it would avoid writing the uncompressed PCH to disk, only to read it back for compression.

@AXiX-official
Copy link
Author

Another thing is, is it necessary to use a standard compression format so that tools like 7z can decompress it properly?

@16bit-ykiko
Copy link
Member

Another thing is, is it necessary to use a standard compression format so that tools like 7z can decompress it properly?

No, this is an internal cache file and is not exposed to the user.

@16bit-ykiko
Copy link
Member

It is possible to store and read PCH or PCM files in memory, but this requires using the Clang API, which can be complex. Besides, I think we should only zip less frequently used files rather than all of them.

@AXiX-official
Copy link
Author

Does less frequently used mean less frequently accessed or less frequently updated?

My current design is to add a background task at startup, delay the trigger for a certain period of time, and then trigger the task to detect the files to be compressed at regular intervals. Is it necessary to configure the delay, timing interval and cold file determination interval?

@16bit-ykiko
Copy link
Member

I think it's best to put this PR on hold for now. I might make some major updates to the PCH and PCM build processes within the next month, and the related logic could change significantly :(.

@AXiX-official
Copy link
Author

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants