-
Notifications
You must be signed in to change notification settings - Fork 161
gds: init louvain #5155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gds: init louvain #5155
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
e454fcb
to
39edafc
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5155 +/- ##
==========================================
- Coverage 86.58% 86.57% -0.01%
==========================================
Files 1410 1410
Lines 61973 61978 +5
Branches 7606 7607 +1
==========================================
+ Hits 53657 53660 +3
- Misses 8141 8143 +2
Partials 175 175 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Cannot reproduce the apple clang error locally. Perhaps failing because of an older version in the builder. |
src/function/gds/louvain.cpp
Outdated
return nextCommId; | ||
} | ||
|
||
void aggregateCommunities(offset_t newCommCount, PhaseState& state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not parallelized yet. Might be a bottleneck on large graphs, although communities quickly get smaller in each phase.
TODO: check performance on large graphs
This comment was marked as outdated.
This comment was marked as outdated.
src/function/gds/louvain.cpp
Outdated
PhaseState& state; | ||
}; | ||
|
||
class WriteResultsVC : public GDSResultVertexCompute { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to refactor and re-use some other ResultsWriter? What is needed so that we do not keep writing our own results writers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will work on this in a future PR.
bd6e7e3
to
eb43f48
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving in case you want to merge but again we need to do the inmemgraph differently and I don't think it should be a lot of work. So you might do it as part of this PR. But if you do so, I want to see again before you merge.
src/function/gds/louvain.cpp
Outdated
PhaseState& state; | ||
}; | ||
|
||
class WriteResultsVC : public GDSResultVertexCompute { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this addressed.
2d15eb9
to
0e4a06d
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go into details of the algorithm implementation. Let's test an its performance on various dataset and see.
I can already tell supporting multi labeled graph will not be trivial. So let's not do it. Supporting filtered graph is possible. I'll need to double check a few cases with InMemGraph.
@ray6080 check the allocation for gds_object_manager.
src/include/graph/graph_mem.h
Outdated
namespace kuzu { | ||
namespace graph { | ||
|
||
using weight_t = common::offset_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this inside InMemGraph or Neighbor because not all algorithm's weight_t is integer. Some (weighted shortest path) could be arbitrary numerical value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will refactor once this is needed. Might have to switch InMemGraph
to a template for the weight type.
struct PhaseState { | ||
InMemGraph graph; | ||
AtomicObjectArray<weight_t> nodeWeightedDegrees; | ||
ObjectArray<CommInfo> currCommInfos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future it would be good to abstract
class CommInfos {
private
ObjectArray<CommInfo> ...;
}
class Communities {
private
AtomicObjectArray<offset_t> ...
}
And expose more meaningful interface like Communities::getCommunity(offset)
instead of state.currComm.get(offset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving to a future PR.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Implements basic parallel Louvain References: - https://en.wikipedia.org/wiki/Louvain_method - Parallel Heuristics for Scalable Community Detection. Hao Lu, Mahantesh Halappanavar, and Ananth Kalyanaraman: https://arxiv.org/abs/1410.1237
Implements basic parallel Louvain.
References: