Skip to content

CASP-Systems-BU/RingSampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RingSampler: GNN sampling on large-scale graphs with io_uring

This repository includes the implementation of RingSampler described in the HotStorage '25 paper.

You can cite the paper using the BibTeX below:

@inproceedings{10.1145/3736548.3737829,
author = {Chen, Qixuan and Song, Yuhang and Martinez, Melissa and Kalavri, Vasiliki},
title = {RingSampler: GNN sampling on large-scale graphs with io_uring},
year = {2025},
isbn = {9798400719479},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3736548.3737829},
doi = {10.1145/3736548.3737829},
abstract = {Neighborhood sampling is a critical computation step in graph learning with Graph Neural Networks (GNNs), often accounting for the majority of the training time. To mitigate this bottleneck and scale training to very large graphs, existing approaches offload the sampling computation to GPUs or computational storage, such as SmartSSDs. Given the ubiquity of multi-core CPUs and high-throughput SSDs, we investigate a simpler design that performs CPU-based sampling, making GPU resources fully available to the aggregation stage of training instead. We propose RingSampler, a new GNN sampling system that leverages io_uring to support efficient training of billion-edge graphs on a single machine. RingSampler parallelizes sampling by transparently assigning mini-batches to threads and effectively overlapping computation with I/O operations. Our results show that RingSampler significantly outperforms SmartSSD-based sampling on large graphs and is competitive with GPU-accelerated approaches on graphs that fit in main memory.},
booktitle = {Proceedings of the 17th ACM Workshop on Hot Topics in Storage and File Systems},
pages = {52–60},
numpages = {9},
keywords = {Graph Neural Networks, Neighborhood sampling, io_uring},
location = {Boston, MA, USA},
series = {HotStorage '25}
}

Configuration

Hardware

  • CPU: MD EPYC 7713P 64C/128T

  • DRAM size: 252GB

  • Disk: 4TB Samsung SSD

  • GPU: NVIDIA A100 80GB GPU (not used by RingSampler)

Software

  • OS: Ubuntu20.04
  • CMake: 3.20.2
  • liburing: build from source

Build and Running Instructions

Repository organization

  • src/: Contains the core implementation of RingSampler, including the use of io_uring, asynchronous sampling logic, and supporting utility functions.
  • preprocess/: Includes scripts to generate the required binary files (edges.bin, offset.bin, train_nodes.bin) for each dataset from raw text-based input.
  • tests/: Contains code to run RingSampler, performing multi-epoch GraphSAGE sampling on various datasets.
  • scripts/: Provides example scripts to run sampling on different datasets with configurable parameters.
  • relatedWork: Contains instructions and scripts for running baseline methods used for comparison in the paper.

Download datasets

Datasets Preprocess

To perform sampling with RingSampler, each dataset must be preprocessed into the following binary files:

  • edges.bin: A list of destination node IDs (uint32_t)
  • offset.bin: Offset indices for each node’s neighbor list (uint64_t)
  • train_nodes.bin: A list of training node IDs (uint32_t)

For a detailed explanation of these files and their structure, please refer to Section 3.1 (System Overview) of the paper.

Preprocessing code is provided in the /gnn-sampling/preprocess/, which converts raw .txt edge list formats into the required binary files.

Build

mkdir -p build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j

Run

Before running, make sure to:

  • Update dataset paths in: src/utils.h
  • Set the correct training node count in: tests/test_multi_epoch.c (variable: training_nodes)
# run node-wise sampling for multi-epoch
./test_multi_epoch <dataset> <QD> <num_threads> <epoch_num> <batch_size>

# run sample script(sample all support datasets)
../experiments/nodewise_sampling.sh

Preview of RingSampler

Here is the diagram that visualize the implementation: workflow_v2

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published