A family of reference models for hypergraphs
Hyper dK-series is a family of randomization methods for hypergraphs. The hyper dK-series produces randomized hypergraphs that preserve up to the individual node’s degree, node’s degree correlation, node’s redundancy coefficient, and/or the hyperedge’s size of the given hypergraph, depending on the parameter values dv = 0, 1, 2, or 2.5 and de = 0 or 1. In general, when dv = 0 or 1, the code runs fast. When dv = 2, it takes longer. When dv = 2.5, it takes even longer than when dv = 2. The value of de (= 0 or 1) does not much affect the speed.
We provide code for the hyper dK-series in Python and C++. The python code is easier to use.
- 2024/09/28: Implemented Python code with numba for the hyper dK-series, which is 20-30 times faster than the previous version for the parameter dv = 2 and 2.5 and achieves comparable generation time for C++ code. Randomized instances generated by the hyper dK-series in Python can be converted to the Hypergraph object of HyperNetX.
We have confirmed that our code works in the following environments.
- Python 3.10.15
- numpy 2.0.2
- numba 0.60.0
- matplotlib 3.9.2
- hypernetx 2.3.0
- macOS 14.4
First, clone this repository:
git clone [email protected]:kazuibasou/hyper-dk-series.git
We need to feed two files, hypergraph_nverts.txt and hypergraph_hyperedges.txt, where hypergraph indicates the name of the network and is arbitrary.
These two files should be placed in hyper-dk-series/data/
.
In each of the two files, each line contains one integer.
The file format follows that of Austin R. Benson's data sets.
The i th line of this file represents the number of nodes contained in the i th hyperedge.
This file contains a contiguous list of indices of the nodes comprising the hyperedges, where the ordering of the hyperedges is the same as in hypergraph_nverts.txt.
Note that our code successfully works as long as the node index starts at 0 and increments by 1.
Let's consider a hypergraph, named example-hypergraph
, that consists of a set of nodes V = {0, 1, 2, 3, 4} and a set of hyperedges E = {{0, 1}, {1, 2}, {0, 1, 2}, {0, 1, 2, 3}, {0, 1, 2, 3, 4}}.
Then, the two input files are as follows:
example-hypergraph_nverts.txt
2
2
3
4
5
example-hypergraph_hyperedges.txt
0
1
1
2
0
1
2
0
1
2
3
0
1
2
3
4
Please see the notebooks 1_basics.ipynb
and 2_randomization.ipynb
in the folder hyper-dk-series/py/
for instructions on using the hyper dk-series in Python.
I measured the running time in seconds for generating a single randomized instance using the hyper dK-series implemented in Python with numba.
I used the same four empirical hypergraph data sets (i.e., drug
, Enron
, primary-school
, and high-school
) as Ref. [1].
I ran all the code on a MacBook Pro with an Apple M3 Max tip and 128GB of main memory.
Yes, the hyper dK-series with dv=2 or 2.5 takes some time because of many rewiring attempts, but the current version is 20-30 times faster than the previous version without numba.
(dv, de) | drug | Enron | primary-school | high-school |
---|---|---|---|---|
(0, 0) | 0.02 | 0.008 | 0.06 | 0.05 |
(1, 0) | 0.008 | 0.005 | 0.03 | 0.02 |
(2, 0) | 5.6 | 3.1 | 23.6 | 9.7 |
(2.5, 0) | 101 | 42.7 | 709 | 188 |
(0, 1) | 0.02 | 0.008 | 0.16 | 0.05 |
(1, 1) | 0.008 | 0.005 | 0.03 | 0.02 |
(2, 1) | 5.1 | 1.9 | 21.1 | 7.7 |
(2.5, 1) | 97 | 40.0 | 605 | 166 |
Require gcc version 4.2.1 or later.
We have confirmed that our code works in the following environments.
- macOS 14.4
(i) Clone this repository:
git clone [email protected]:kazuibasou/hyper-dk-series.git
(ii) Go to hyper-dk-series/cpp/src/
:
cd hyper-dk-series/cpp/src
(iii) Run the make command:
make
This generates the following structure of the directory.
hyper-dk-series/
├ cpp/
├ bin/
└ src/
├ data/
├ py/
└ rand_hypergraph/
If you find a file hyper_dk_series
in the folder hyper-dk-series/cpp/bin/
, the build has been successfully completed.
We need to feed two files, hypergraph_nverts.txt and hypergraph_hyperedges.txt, where hypergraph indicates the name of the network and is arbitrary.
These two files should be placed in hyper-dk-series/data/
.
In each of the two files, each line contains one integer.
The file format follows that of Austin R. Benson's data sets.
The i th line of this file represents the number of nodes contained in the i th hyperedge.
This file contains a contiguous list of the nodes comprising the hyperedges, where the ordering of the hyperedges is the same as in hypergraph_nverts.txt.
Go to hyper-dk-series/cpp/bin/
and run the following command:
./hyper_dk_series <hypergraph> <dv> <de> <num_gen>
The four arguments are as follows.
The name of the hypergraph.
The value of dv, which should be 0, 1, 2, or 2.5.
The value of de, which should be 0 or 1.
The number of hypergraphs to be generated.
To generate three randomized hypergraphs with (dv, de) = (0, 1) for the hypergraph named example-hypergraph
, go to hyper-dk-series/cpp/bin/
and run the following command:
./hyper_dk_series example-hypergraph 0 1 3
The n th (n=1, ..., num_gen) randomized hypergraph, i.e., hypergraph_dv_de_n_nverts.txt and hypergraph_dv_de_n_hyperedges.txt will be created in the folder hyper-dk-series/rand_hypergraph/
.
[1] Kazuki Nakajima, Kazuyuki Shudo, Naoki Masuda. Randomizing Hypergraphs Preserving Degree Correlation and Local Clustering. December 2021. [paper]
This source code is released under the MIT License, see LICENSE.txt.
- Kazuki Nakajima (https://kazuibasou.github.io/)
(Last update: 2024/09/28)