-
Notifications
You must be signed in to change notification settings - Fork 130
Add arena helper functions and improve block insertion #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9575ef2
to
53bec08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new functions arena_calloc()
, arena_realloc()
, arena_strdup()
and arena_memdup()
, they have been added but are not used by any callers, and I am unsure whether it is appropriate to keep these unused functions in the codebase.
Based on my current observations, most of the memory allocation in shecc follow a unified allocate-and-release model. This makes almost all scenarios suitable for replacing the current use of In the short term, I'm also working on improving the current Additionally, I’m exploring optimizations like improving cache locality by replacing the linked list used in the internal structure of the hashmap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make git commit messages always informative based on the motivations and proposed code changes.
Bito Review Skipped - No Changes Detected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using backticks in commit messages. Backticks can be easily confused with single quotes on some terminals, reducing readability. Plain text or single quotes provide sufficient clarity and emphasis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall changes are fine. Can you also show how this patch would affect stage 1 binary compilation performance?
The Stage-1 benchmark has already been added to the initial conversation Since I don't have ARM hardware on hand, I can only use QEMU-ARM for simulation to run benchmarks. |
No, wait for the empirical confirmation from @ChAoSUnItY . |
Refactor 'arena_alloc' to insert newly allocated blocks at the head of the block list, improving allocation efficiency. This reduces pointer chasing during allocation and enables recently freed memory to be reused more quickly. Introduce 'arena_calloc', 'arena_memdup', 'arena_strdup', and 'arena_realloc' to extend the arena allocator's API: - 'arena_calloc': Allocate zero-initialized memory blocks. - 'arena_memdup': Duplicate arbitrary data into arena-managed memory. - 'arena_strdup': Copy null-terminated strings into the arena. - 'arena_realloc': Resize existing arena-managed allocations. These improvements lay the foundation for replacing standard 'malloc', 'calloc', 'realloc', and 'free' with the arena allocator, providing better performance, consistent memory handling, and simpler batch deallocation throughout the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've confirmed the following benchmark result generated from stage 1 compilation on Raspberry Pi 4B:
Before
Result
Warming up 5 times...
Running 15 benchmarks...
Run 1: user=5.28s sys=1.92s elapsed=7.21s maxrss=628864KB
Run 2: user=5.12s sys=2.02s elapsed=7.15s maxrss=628864KB
Run 3: user=5.16s sys=1.96s elapsed=7.13s maxrss=628864KB
Run 4: user=5.26s sys=1.92s elapsed=7.20s maxrss=628864KB
Run 5: user=5.17s sys=2.00s elapsed=7.18s maxrss=628864KB
Run 6: user=5.19s sys=1.97s elapsed=7.16s maxrss=628864KB
Run 7: user=5.25s sys=1.95s elapsed=7.21s maxrss=628736KB
Run 8: user=5.20s sys=1.96s elapsed=7.18s maxrss=628864KB
Run 9: user=5.13s sys=1.96s elapsed=7.11s maxrss=628864KB
Run 10: user=5.24s sys=1.89s elapsed=7.14s maxrss=628864KB
Run 11: user=5.26s sys=1.93s elapsed=7.21s maxrss=628864KB
Run 12: user=5.33s sys=1.90s elapsed=7.24s maxrss=628864KB
Run 13: user=5.21s sys=1.99s elapsed=7.21s maxrss=628864KB
Run 14: user=5.25s sys=1.96s elapsed=7.22s maxrss=628864KB
Run 15: user=5.06s sys=2.11s elapsed=7.18s maxrss=628864KB
----------------------------------
Average user time: 5.207333s
Average system time: 1.962667s
Average elapsed time: 7.182000s
Average max RSS: 628855.47 KB
After
Result
Warming up 5 times...
Running 15 benchmarks...
Run 1: user=0.74s sys=1.71s elapsed=2.47s maxrss=424832KB
Run 2: user=0.73s sys=1.77s elapsed=2.51s maxrss=424960KB
Run 3: user=0.74s sys=1.71s elapsed=2.47s maxrss=424960KB
Run 4: user=0.80s sys=1.66s elapsed=2.47s maxrss=424832KB
Run 5: user=0.81s sys=1.64s elapsed=2.46s maxrss=424960KB
Run 6: user=0.72s sys=1.73s elapsed=2.47s maxrss=424960KB
Run 7: user=0.70s sys=1.74s elapsed=2.46s maxrss=424960KB
Run 8: user=0.77s sys=1.69s elapsed=2.47s maxrss=424832KB
Run 9: user=0.77s sys=1.68s elapsed=2.47s maxrss=424960KB
Run 10: user=0.75s sys=1.73s elapsed=2.49s maxrss=424960KB
Run 11: user=0.66s sys=1.80s elapsed=2.46s maxrss=424960KB
Run 12: user=0.66s sys=1.79s elapsed=2.46s maxrss=424960KB
Run 13: user=0.71s sys=1.75s elapsed=2.47s maxrss=424832KB
Run 14: user=0.75s sys=1.70s elapsed=2.47s maxrss=424960KB
Run 15: user=0.70s sys=1.77s elapsed=2.48s maxrss=424960KB
----------------------------------
Average user time: 0.734000s
Average system time: 1.724667s
Average elapsed time: 2.472000s
Average max RSS: 424925.87 KB
Benchmark machine
`.::///+:/-. --///+//-:`` chaos@raspberrypi
`+oooooooooooo: `+oooooooooooo: -----------------
/oooo++//ooooo: ooooo+//+ooooo. OS: Raspbian GNU/Linux 12 (bookworm) aarch64
`+ooooooo:-:oo- +o+::/ooooooo: Host: Raspberry Pi 4 Model B Rev 1.5
`:oooooooo+`` `.oooooooo+- Kernel: 6.6.51+rpt-rpi-v8
`:++ooo/. :+ooo+/.` Uptime: 32 mins
...` `.----.` ``.. Packages: 1604 (dpkg)
.::::-``:::::::::.`-:::-` Shell: bash 5.2.15
-:::-` .:::::::-` `-:::- Resolution: 1920x1080
`::. `.--.` `` `.---.``.::` DE: wlroots
.::::::::` -::::::::` ` Theme: PiXflat [GTK3]
.::` .:::::::::- `::::::::::``::. Icons: PiXflat [GTK3]
-:::` ::::::::::. ::::::::::.`:::- Terminal: lxterminal
:::: -::::::::. `-:::::::: :::: Terminal Font: Monospace 10
-::- .-:::-.``....``.-::-. -::- CPU: (4) @ 1.800GHz
.. `` .::::::::. `..`.. Memory: 692MiB / 3791MiB
-:::-` -::::::::::` .:::::`
:::::::` -::::::::::` :::::::.
.::::::: -::::::::. ::::::::
`-:::::` ..--.` ::::::.
`...` `...--..` `...`
.::::::::::
`.-::::-`
Thank @AW-AlanWu for contributing! |
This PR introduces several helper functions into the arena allocator and refactors the block insertion strategy in
arena_alloc
to improve performance. As a result, total allocation times have been measurably reduced.Below is the benchmark script and corresponding results demonstrating the speedup:
Hardware info
Benchmark Script on Stage-0
Benchmark Script
Before (Stage-0)
Result
After (Stage-0)
Result
Benchmark Script on Stage-1
Benchmark Script
Before (Stage-1)
Result
After (Stage-1)
Result
Summary by Bito
This pull request enhances arena memory management by introducing new helper functions and refactoring the block insertion strategy in `arena_alloc`, leading to improved performance and reduced allocation times. Additionally, error handling has been improved with clearer failure messages, enhancing code robustness.