-
Notifications
You must be signed in to change notification settings - Fork 59
Add compression for PCH/PCM #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @AXiX-official, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces default LZ4 compression for Precompiled Headers (PCH) and Precompiled Modules (PCM). The primary goal is to reduce the disk footprint of these large intermediate files, which can improve storage efficiency and potentially speed up build times by reducing I/O. The implementation includes the necessary build system updates, core compression/decompression utilities, and modifications to the compiler's handling of PCH/PCM files to seamlessly work with the compressed format.
Highlights
- LZ4 Compression Integration: The LZ4 compression library has been integrated into the project's build system (CMake and xmake) to enable efficient compression of Precompiled Headers (PCH) and Precompiled Modules (PCM).
- PCH/PCM Compression & Decompression: New utility functions have been introduced to compress PCH/PCM files using LZ4, storing them with a custom header containing the original file size. Corresponding decompression logic allows these files to be read back into memory.
- In-Memory PCH/PCM Handling: The compilation process has been updated to detect and decompress LZ4-compressed PCH/PCM files on the fly. Decompressed content is then provided to Clang via an InMemoryFileSystem to allow it to be treated as if it were a file on disk.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR adds LZ4 compression for PCH/PCM files to reduce disk space usage. The implementation involves adding lz4 as a dependency, creating new compression/decompression utilities, and integrating them into the PCH/PCM build and consumption pipeline. The use of a virtual file system to feed decompressed in-memory PCHs to Clang is a clever approach.
My review focuses on improving error handling to prevent server crashes, fixing a logic bug in the PCH loading fallback, and suggesting performance improvements by using llvm::StringRef for paths. I've also provided feedback on the open questions from your PR description:
- Configurable Compression: Making compression optional via configuration would be a great addition. Some users might prefer to disable it to save CPU cycles, especially on fast storage.
- zstd vs lz4: zstd generally offers better compression ratios than lz4, though often at the cost of compression/decompression speed. It would be valuable to benchmark both for your use case with PCH files. You could consider supporting both and making the algorithm configurable.
- In-memory PCH generation: Clang can indeed write PCH output to a memory buffer. You can achieve this by providing a custom
llvm::raw_pwrite_streamto theclang::GeneratePCHAction. This would be a significant optimization, as it would avoid writing the uncompressed PCH to disk, only to read it back for compression.
|
Another thing is, is it necessary to use a standard compression format so that tools like 7z can decompress it properly? |
No, this is an internal cache file and is not exposed to the user. |
|
It is possible to store and read PCH or PCM files in memory, but this requires using the Clang API, which can be complex. Besides, I think we should only zip less frequently used files rather than all of them. |
|
Does My current design is to add a background task at startup, delay the trigger for a certain period of time, and then trigger the task to detect the files to be compressed at regular intervals. Is it necessary to configure the delay, timing interval and cold file determination interval? |
|
I think it's best to put this PR on hold for now. I might make some major updates to the PCH and PCM build processes within the next month, and the related logic could change significantly :(. |
|
ok |
This PR implements default LZ4 compression for PCH/PCM.
There are a couple of points I'm not sure about
related to #209