Skip to content

Remove GLib dependency within bin/ #184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: develop
Choose a base branch
from
Open

Conversation

mack-w
Copy link

@mack-w mack-w commented May 15, 2025

See #133.

mack-w added 5 commits May 15, 2025 02:10
This commit is the first in a series of commits to remove gLib.

This commit moves an unused thread pool library within traceAnalyzer
to the utils directory, which may find its use elsewhere.
This commit is the first in series (ignoring reverted) to remove gLib.

This commit introduces [hashmap.h](https://github.com/sheredom/hashmap.h)
in an effort to remove GHashTable dependencies from the project.

Future commits shall introduce similar header-only libs as well.
@mack-w mack-w marked this pull request as draft May 15, 2025 17:45
@1a1a11a
Copy link
Owner

1a1a11a commented May 16, 2025

just FYI, it might be better to start with a small PR :)

@mack-w
Copy link
Author

mack-w commented May 16, 2025

Thanks for the suggestion. So let's begin with something like "remove GLib dependency from main executable" first? Since gLib usage within the algo part is more extensive and complicated, maybe I shall get rid of GLib in bin/ first (and I'm close to that.)

BTW the quality gate check is complaining about duplicated code on hashmap.h... it's saying like "for i=0" is a duplicate. I guess I'm gonna ignore that unless there're other problems with cq.

@mack-w mack-w changed the title WIP: remove GLib dependency WIP: remove GLib dependency within *bin/* May 16, 2025
@mack-w mack-w changed the title WIP: remove GLib dependency within *bin/* WIP: remove GLib dependency within bin/ May 16, 2025
mack-w added 2 commits May 17, 2025 23:18
Move from GThreadPool to a homebrewed threadpool
adapted from https://nachtimwald.com/2019/04/12/thread-pool-in-c

Note that this code **only** runs with UBSan enabled.
Not for production use (yet).
@1a1a11a
Copy link
Owner

1a1a11a commented May 17, 2025

Sounds good!

mack-w added 2 commits May 19, 2025 01:15
Finalize move from GThreadPool to a homebrewed threadpool
adapted from https://nachtimwald.com/2019/04/12/thread-pool-in-c

Note, that simulation results differ from original.
Reason for this is under investigation.
@mack-w
Copy link
Author

mack-w commented May 19, 2025

@1a1a11a Hi! I would say that this PR is near finished.

In the meantime, I'm observing an at-most 1% difference in simulation results for some algorithms (e.g. FIFO) with mrcProfiler. I also see assertion error hit_size_vec[9] in testMrcProfiler::test_minisim_profiler(). I'm still investigating the reason for this.

Right now sanitizers are not throwing any useful error, and vvverbose logs look good as well. Since testMrcProfiler is single-threaded, there shall be no data races. I will try to git bisect what's going wrong here. In the meantime if you have any hints or you want to see the logs please reply.

PS: that strncpy security hotspot emerged out of the blue and I ASSERT_IT_IS_SAFE

@1a1a11a
Copy link
Owner

1a1a11a commented May 19, 2025

loop in @JasonGuo98

@1a1a11a 1a1a11a requested review from 1a1a11a and Copilot May 19, 2025 12:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR is focused on removing GLib dependencies in favor of using native pthreads and custom data structures. Key changes include replacing GLib’s GMutex, GThreadPool, and GHashTable with pthread-based synchronization primitives and custom thread pool/hashmap implementations, as well as updating include paths and launch configurations.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
libCacheSim/traceReader/reader.c Replace non-thread safe boolean flag with a volatile qualifier
libCacheSim/profiler/threadpool.c Introduce custom threadpool implementation adapted from external source
libCacheSim/profiler/simulator.c Replace GThreadPool and GMutex with pthread equivalents and update index conversion
libCacheSim/profiler/profilerLRU.c Switch from g_hash_table to custom hashmap functions
libCacheSim/profiler/dist.c Replace GLib hash table calls with custom hashmap functions
libCacheSim/include/libCacheSim/threadpool.h New header for threadpool API
libCacheSim/include/libCacheSim/hashmap_defs.in New header for hashmap definitions
.vscode/launch.json Updated executable path and arguments

@mack-w
Copy link
Author

mack-w commented May 19, 2025

loop in @JasonGuo98

Did a quick test, it seems that the thread pool library that I'm quoting is problematic. I switched to a battle-tested thread pool library, and (from preliminary test results) the problem is gone :)

Gotta do some git rebase afterwards...

For the deeper side of the problem, I suspect it's still the synchronisation problem with the reader that I'm not coping well with., since TSan reports a data race issue there. It is a benign problem at first glance, but who knows. Pushing a new commit to fix this soon.

Upd: seems more like a problem with simulation steps, rather than reader problem. Introduced with the latest commit. Overlooked the problem since TSan keeps reporting the other way.

@mack-w
Copy link
Author

mack-w commented May 20, 2025

Add these two lines to respective functions in simulator.c:

INFO("actual progress %d\n", progress);
INFO("actual num_of_caches %d\n", num_of_caches);

Before commit af3ff57:

[INFO]  simulator.c:307  (tid=126433726604224): actual progress 10
[INFO]  simulator.c:308  (tid=126433726604224): actual num_of_caches 10
ok 3 /libCacheSim/test_minisim_profiler_with_fixed_sample_rate

After commit af3ff57:

[INFO]  simulator.c:350  (tid=130706295354304): actual progress 9
[INFO]  simulator.c:351  (tid=130706295354304): actual num_of_caches 10
ERROR:test/test_mrcProfiler.cpp:188:void test_minisim_profiler_with_fixed_sample_rate(gconstpointer): assertion failed
not ok 3 /libCacheSim/test_minisim_profiler_with_fixed_sample_rate

The logic of the original code says:

while (progress < num_of_caches - 1) {
    print_progress((double)progress / (double)(num_of_caches - 1) * 100);
}

It seems that progress shall be num_of_caches minus 1, but with the original code (upstream HEAD) it is not the case. Is it designed like this on purpose? Or is it just a data race problem, as TSan reported?

Update: it is designed for purpose, my bad. Pushing a fix.

Anyways, marking this PR as ready for review.

@mack-w mack-w marked this pull request as ready for review May 20, 2025 11:08
@mack-w mack-w marked this pull request as draft May 20, 2025 11:19
The previous commit af3ff57 introduced pthread_cond_t to synchronize
progress. However, in some circumstances, the simulate_* functions may
exit prematurely due to incorrect conditional wait. This commit fix
the abovementioned problem.
@mack-w mack-w marked this pull request as ready for review May 20, 2025 11:26
@mack-w mack-w changed the title WIP: remove GLib dependency within bin/ Remove GLib dependency within bin/ May 20, 2025
@1a1a11a
Copy link
Owner

1a1a11a commented May 22, 2025

Yes, you can add a separate section for data structures in doc

@mack-w
Copy link
Author

mack-w commented May 22, 2025

Just noted that there seems to be a deadlock problem that I didn't notice while pushing. Trying to fix

@mack-w mack-w marked this pull request as draft May 22, 2025 06:55
In Release builds, access to params->n_caches is optimized to a stack
access. In Debug builds however, access is by pointer. Failure to
initialize this leads to infinite loop in _simulate, thus deadlock in
simulate_* functions.
@mack-w mack-w marked this pull request as ready for review May 22, 2025 08:16
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] it would be good to have a benchmark for hash table.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall figure out a way to do so
Does it need to be a benchmark on libCacheSim, or a independent/use case-irrelevant benchmark?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An independent benchmark should be sufficient, and it would be good to compare with a common library, e.g., Glib, C++ std, or absl hashtable. BTW, I am also creating a new issue to build a performance test for libCacheSim.

Copy link
Author

@mack-w mack-w Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

work in progress
@1a1a11a Where shall I put the benckmark? As a test program or just detail in the docs

@mack-w mack-w marked this pull request as draft May 30, 2025 05:13
@mack-w
Copy link
Author

mack-w commented May 30, 2025

@1a1a11a Sorry for not working this week, having some personal issues that I have to fix. Shall resume next week

@mack-w mack-w marked this pull request as ready for review June 3, 2025 08:33
@1a1a11a
Copy link
Owner

1a1a11a commented Jun 3, 2025

You can create a separate benchmark folder in the top directory, i.e., benchmarks/dataStructure/, would this work?

@mack-w
Copy link
Author

mack-w commented Jun 3, 2025

That makes sense considering that you want to add benchmark to the simulator.
Anyways I'm putting this as second in priority list, maybe I will open a separate PR for this :)

@1a1a11a
Copy link
Owner

1a1a11a commented Jun 13, 2025

Shall I take a look?

@mack-w
Copy link
Author

mack-w commented Jun 14, 2025

Shall I take a look?

Yeah I consider this PR "finished" (except for the benchmark part)
@1a1a11a Sorry I really had a lot of personal issues recently... I think next week I'll finally be able to look into smth with focus.

@1a1a11a
Copy link
Owner

1a1a11a commented Jun 16, 2025

@haochengxia Can you take a look at this PR?

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

@@ -85,7 +91,7 @@ int64_t get_stack_dist_add_req(const request_t *req, sTree **splay_tree,
newtree = insert(curr_ts, *splay_tree);
} else {
// not first time access
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When converting a pointer to an integer, can we use int64_t old_ts = (int64_t)(uintptr_t)gp; to avoid potential issues across archs?

BTW, as we removed glib, is gp still a good variable name?

#define cptr_t_ const void*

// platform-depedent
#define ptr_size_t_ size_t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now quite good, and I recommend using stdint.h for robustness and readability.

e.g.,
#define ptr_uintptr_t_ uintptr_t
#define ptr_intptr_t_ intptr_t

#define int_to_ptr(i) (ptr_t_)((ptr_intptr_t_)i)
#define uint_to_ptr(i) (ptr_t_)((ptr_uintptr_t_)i)
#define int_to_cptr(i) (cptr_t_)((ptr_intptr_t_)i)
#define uint_to_cptr(i) (cptr_t_)((ptr_uintptr_t_)i)

#define ptr_to_int(p) (int)((ptr_intptr_t_)p)
#define ptr_to_uint(p) (unsigned int)((ptr_uintptr_t_)p)
...

#include "../include/libCacheSim/profilerLRU.h"

#include "../dataStructure/splay.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can preserve the order of header files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to preserve the order, but the current order is determined by clang-format.

@mack-w
Copy link
Author

mack-w commented Jun 19, 2025

@haochengxia Thanks for the code suggestions.
Also I see there's merge conflict. I will fix that and update my own branch soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants