Skip to content

gds: init louvain #5155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 30, 2025
Merged

gds: init louvain #5155

merged 10 commits into from
Apr 30, 2025

Conversation

sdht0
Copy link
Contributor

@sdht0 sdht0 commented Mar 28, 2025

Implements basic parallel Louvain.

References:

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@sdht0 sdht0 force-pushed the gds-louvain branch 2 times, most recently from e454fcb to 39edafc Compare April 9, 2025 15:12

This comment was marked as outdated.

@sdht0 sdht0 marked this pull request as ready for review April 10, 2025 14:24
Copy link

codecov bot commented Apr 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.57%. Comparing base (e1239d6) to head (d2933f4).
Report is 11 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5155      +/-   ##
==========================================
- Coverage   86.58%   86.57%   -0.01%     
==========================================
  Files        1410     1410              
  Lines       61973    61978       +5     
  Branches     7606     7607       +1     
==========================================
+ Hits        53657    53660       +3     
- Misses       8141     8143       +2     
  Partials      175      175              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

This comment was marked as outdated.

This comment was marked as outdated.

@sdht0
Copy link
Contributor Author

sdht0 commented Apr 10, 2025

Cannot reproduce the apple clang error locally. Perhaps failing because of an older version in the builder.

return nextCommId;
}

void aggregateCommunities(offset_t newCommCount, PhaseState& state) {
Copy link
Contributor Author

@sdht0 sdht0 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not parallelized yet. Might be a bottleneck on large graphs, although communities quickly get smaller in each phase.

TODO: check performance on large graphs

This comment was marked as outdated.

PhaseState& state;
};

class WriteResultsVC : public GDSResultVertexCompute {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to refactor and re-use some other ResultsWriter? What is needed so that we do not keep writing our own results writers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will work on this in a future PR.

@sdht0 sdht0 force-pushed the gds-louvain branch 2 times, most recently from bd6e7e3 to eb43f48 Compare April 17, 2025 04:04

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Copy link
Contributor

@semihsalihoglu-uw semihsalihoglu-uw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving in case you want to merge but again we need to do the inmemgraph differently and I don't think it should be a lot of work. So you might do it as part of this PR. But if you do so, I want to see again before you merge.

PhaseState& state;
};

class WriteResultsVC : public GDSResultVertexCompute {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this addressed.

@sdht0 sdht0 force-pushed the gds-louvain branch 2 times, most recently from 2d15eb9 to 0e4a06d Compare April 18, 2025 13:58

This comment was marked as outdated.

This comment was marked as outdated.

Copy link
Contributor

@andyfengHKU andyfengHKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go into details of the algorithm implementation. Let's test an its performance on various dataset and see.

I can already tell supporting multi labeled graph will not be trivial. So let's not do it. Supporting filtered graph is possible. I'll need to double check a few cases with InMemGraph.

@ray6080 check the allocation for gds_object_manager.

namespace kuzu {
namespace graph {

using weight_t = common::offset_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this inside InMemGraph or Neighbor because not all algorithm's weight_t is integer. Some (weighted shortest path) could be arbitrary numerical value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor once this is needed. Might have to switch InMemGraph to a template for the weight type.

struct PhaseState {
InMemGraph graph;
AtomicObjectArray<weight_t> nodeWeightedDegrees;
ObjectArray<CommInfo> currCommInfos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future it would be good to abstract

class CommInfos {

private 
    ObjectArray<CommInfo> ...;
}

class Communities {

private
    AtomicObjectArray<offset_t> ...
}

And expose more meaningful interface like Communities::getCommunity(offset) instead of state.currComm.get(offset)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving to a future PR.

@sdht0 sdht0 requested a review from benjaminwinger as a code owner April 30, 2025 15:06

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@sdht0 sdht0 merged commit d53c8b2 into master Apr 30, 2025
28 checks passed
@sdht0 sdht0 deleted the gds-louvain branch April 30, 2025 23:58
ray6080 pushed a commit that referenced this pull request May 1, 2025
Implements basic parallel Louvain

References:
- https://en.wikipedia.org/wiki/Louvain_method
- Parallel Heuristics for Scalable Community Detection. Hao Lu, Mahantesh Halappanavar, and Ananth Kalyanaraman: https://arxiv.org/abs/1410.1237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants