Skip to content

Hyper dK-series is a family of randomized reference models for hypergraphs.

License

Notifications You must be signed in to change notification settings

kazuibasou/hyper-dk-series

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hyper dK-series
Randomizing Hypergraphs Preserving Degree Correlation and Local Clustering

A family of reference models for hypergraphs

License: MIT ARXIV: 2106.12162

Hyper dK-series is a family of randomization methods for hypergraphs. The hyper dK-series produces randomized hypergraphs that preserve up to the individual node’s degree, node’s degree correlation, node’s redundancy coefficient, and/or the hyperedge’s size of the given hypergraph, depending on the parameter values dv = 0, 1, 2, or 2.5 and de = 0 or 1. In general, when dv = 0 or 1, the code runs fast. When dv = 2, it takes longer. When dv = 2.5, it takes even longer than when dv = 2. The value of de (= 0 or 1) does not much affect the speed.

We provide code for the hyper dK-series in Python and C++. The python code is easier to use.

What's new

  • 2024/09/28: Implemented Python code with numba for the hyper dK-series, which is 20-30 times faster than the previous version for the parameter dv = 2 and 2.5 and achieves comparable generation time for C++ code. Randomized instances generated by the hyper dK-series in Python can be converted to the Hypergraph object of HyperNetX.

Python

Requirements

We have confirmed that our code works in the following environments.

Python and its libraries

  • Python 3.10.15
  • numpy 2.0.2
  • numba 0.60.0
  • matplotlib 3.9.2
  • hypernetx 2.3.0

OS

  • macOS 14.4

Usage

First, clone this repository:

git clone [email protected]:kazuibasou/hyper-dk-series.git

Input files

We need to feed two files, hypergraph_nverts.txt and hypergraph_hyperedges.txt, where hypergraph indicates the name of the network and is arbitrary. These two files should be placed in hyper-dk-series/data/. In each of the two files, each line contains one integer. The file format follows that of Austin R. Benson's data sets.

hypergraph_nverts.txt

The i th line of this file represents the number of nodes contained in the i th hyperedge.

hypergraph_hyperedges.txt

This file contains a contiguous list of indices of the nodes comprising the hyperedges, where the ordering of the hyperedges is the same as in hypergraph_nverts.txt.

Note that our code successfully works as long as the node index starts at 0 and increments by 1.

Example

Let's consider a hypergraph, named example-hypergraph, that consists of a set of nodes V = {0, 1, 2, 3, 4} and a set of hyperedges E = {{0, 1}, {1, 2}, {0, 1, 2}, {0, 1, 2, 3}, {0, 1, 2, 3, 4}}. Then, the two input files are as follows:

example-hypergraph_nverts.txt

2
2
3
4
5

example-hypergraph_hyperedges.txt

0
1
1
2
0
1
2
0
1
2
3
0
1
2
3
4

Generating randomized hypergraphs

Please see the notebooks 1_basics.ipynb and 2_randomization.ipynb in the folder hyper-dk-series/py/ for instructions on using the hyper dk-series in Python.

Benchmark (2024/10)

I measured the running time in seconds for generating a single randomized instance using the hyper dK-series implemented in Python with numba. I used the same four empirical hypergraph data sets (i.e., drug, Enron, primary-school, and high-school) as Ref. [1]. I ran all the code on a MacBook Pro with an Apple M3 Max tip and 128GB of main memory.

Yes, the hyper dK-series with dv=2 or 2.5 takes some time because of many rewiring attempts, but the current version is 20-30 times faster than the previous version without numba.

(dv, de) drug Enron primary-school high-school
(0, 0) 0.02 0.008 0.06 0.05
(1, 0) 0.008 0.005 0.03 0.02
(2, 0) 5.6 3.1 23.6 9.7
(2.5, 0) 101 42.7 709 188
(0, 1) 0.02 0.008 0.16 0.05
(1, 1) 0.008 0.005 0.03 0.02
(2, 1) 5.1 1.9 21.1 7.7
(2.5, 1) 97 40.0 605 166

C++

Requirements

Require gcc version 4.2.1 or later.

We have confirmed that our code works in the following environments.

  • macOS 14.4

Build

(i) Clone this repository:

git clone [email protected]:kazuibasou/hyper-dk-series.git

(ii) Go to hyper-dk-series/cpp/src/:

cd hyper-dk-series/cpp/src

(iii) Run the make command:

make

This generates the following structure of the directory.

hyper-dk-series/
├ cpp/
   ├ bin/
   └ src/
├ data/
├ py/
└ rand_hypergraph/

If you find a file hyper_dk_series in the folder hyper-dk-series/cpp/bin/, the build has been successfully completed.

Usage

Input files

We need to feed two files, hypergraph_nverts.txt and hypergraph_hyperedges.txt, where hypergraph indicates the name of the network and is arbitrary. These two files should be placed in hyper-dk-series/data/. In each of the two files, each line contains one integer. The file format follows that of Austin R. Benson's data sets.

hypergraph_nverts.txt

The i th line of this file represents the number of nodes contained in the i th hyperedge.

hypergraph_hyperedges.txt

This file contains a contiguous list of the nodes comprising the hyperedges, where the ordering of the hyperedges is the same as in hypergraph_nverts.txt.

Generating randomized hypergraphs

Go to hyper-dk-series/cpp/bin/ and run the following command:

./hyper_dk_series <hypergraph> <dv> <de> <num_gen>

The four arguments are as follows.

<hypergraph>

The name of the hypergraph.

<dv>

The value of dv, which should be 0, 1, 2, or 2.5.

<de>

The value of de, which should be 0 or 1.

<num_gen>

The number of hypergraphs to be generated.

Example

To generate three randomized hypergraphs with (dv, de) = (0, 1) for the hypergraph named example-hypergraph, go to hyper-dk-series/cpp/bin/ and run the following command:

./hyper_dk_series example-hypergraph 0 1 3

Output files

The n th (n=1, ..., num_gen) randomized hypergraph, i.e., hypergraph_dv_de_n_nverts.txt and hypergraph_dv_de_n_hyperedges.txt will be created in the folder hyper-dk-series/rand_hypergraph/.

Reference

[1] Kazuki Nakajima, Kazuyuki Shudo, Naoki Masuda. Randomizing Hypergraphs Preserving Degree Correlation and Local Clustering. December 2021. [paper]

License

This source code is released under the MIT License, see LICENSE.txt.

Contact

(Last update: 2024/09/28)

About

Hyper dK-series is a family of randomized reference models for hypergraphs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published