Beyond Flatland: A Placement Flow for 3D FPGAs

This repository extends the Verilog-to-Routing (VTR) framework (release v9.0.0) with new 3D-aware placement capabilities. These modifications are detailed in the paper "Beyond Flatland: A Placement Flow for 3D FPGAs" submitted to DAC'26.

Overview

This work introduces a novel placement flow specifically designed for 3D FPGAs, extending the standard 2D VTR placement algorithm with:

3D-aware partitioning using TritonPart for initial layer assignment
Novel move generators optimized for 3D FPGA architectures
Timing-aware layer optimization with dynamic parameter adjustment
Support for multiple 3D interconnect strategies (connection-box, switch-box, and hybrid)

The implementation demonstrates significant improvements in critical path delay and wirelength compared to baseline 2D and naive 3D approaches.

Key Modifications to VTR v9.0.0

This repository includes the following major enhancements to the base VTR framework:

1. TritonPart Integration for Layer Assignment

Files: vpr/src/base/partition_creator.cpp, partitioning_engine.cpp, hyper_graph.cpp

Post-packing hypergraph partitioning to assign clustered blocks to layers
Integration with OpenROAD's TritonPart tool for timing-aware partitioning
Criticality-driven partitioning to minimize inter-layer connections

2. 3D-Aware Placement Move Generators

Directory: vpr/src/place/move_generators/

New move generator implementations designed for 3D placement:

Layer Swap Moves (layer_swap_move_generator.cpp, layer_swap_ranged_move_generator.cpp): Probabilistically swap blocks between layers
Probabilistic Layer Assignment (*_probabilistic.cpp variants): Centroid, median, and weighted variants that consider layer assignments
No Layer Change Variants (*_no_layer_change.cpp): Traditional moves that respect initial layer constraints
Critical Layer Moves (critical_layer_move_generator.cpp): Timing-driven layer reassignment

3. Enhanced Timing Optimization

Command-line parameters:

--timing_tradeoff_adjustor: Controls timing vs wirelength tradeoff dynamically during annealing (linear, exponential, etc.)
--timing_layer_weight_adjustor: Adjusts the weight given to inter-layer connections during placement
--rl_agent_move_set: Selects adaptive move generation strategies (e.g., prob_layer_swap)
--timing_tradeoff_start/end: Start and end values for timing tradeoff
--timing_layer_weight_start/end: Start and end values for layer weight penalties
--timing_layer_weight_start_sr/end_sr: Start and end acceptance for applying layer weight penalties
--timing_tradeoff_start_sr/end_sr: Start and end acceptance for timing tradeoff variation
--partition_post_pack: Partition the packed netlist using TritonPart
--soft_partitioning init_place: Only uses the partitioning results for initial placement, and relaxes layer assignment constraints after.

4. 3D Interconnect Architecture Support

Seven architecture configurations supporting different via placement strategies (see Architecture Files section below).

Base Framework

This work builds on the official VTR release v9.0.0:
🔗 VTR v9.0.0 Source
📘 VTR Documentation

Prerequisites

System Requirements

Operating System: Linux (64-bit) - tested on Ubuntu 20.04/22.04
Compiler: GCC 9.0+ or Clang 10.0+ with C++17 support
Memory: Minimum 8GB RAM (16GB+ recommended for larger benchmarks)
Disk Space: ~10GB for full build and benchmarks

Required Dependencies

Standard VTR Dependencies:
- CMake 3.16+
- Python 3.6+
- Flex and Bison
- Cairo graphics library
- Eigen3
OpenROAD (for TritonPart):

This flow integrates with TritonPart for partitioning, which requires the OpenROAD toolchain.
Please install OpenROAD following the instructions at:
🔗 https://theopenroadproject.org/

Configuration

After installing OpenROAD, you must update two file paths:

Update the path to tritonpart_run_script.sh:

Edit vpr/src/base/partition_creator.cpp at line 100:
```
std::string triton_path = "/path/to/tritonpart_run_script.sh";
```
Change this to the absolute path of tritonpart_run_script.sh in your repository.
Update the path to the openroad executable:

Edit tritonpart_run_script.sh at line 4:
```
OR_EXEC="/path/to/OpenROAD/build/bin/openroad"
```
Change this to point to your local OpenROAD installation.

Building the Tool

Step 1: Clone and Initialize

git clone <repository-url>
cd vtr-3d
git submodule init
git submodule update

Step 2: Install System Dependencies

For Ubuntu/Debian systems:

./install_apt_packages.sh

For other Linux distributions, please refer to the VTR Building Guide.

Step 3: Setup Python Environment

Create and activate a Python virtual environment:

make env
source .venv/bin/activate
pip install -r requirements.txt

Note: You will need to activate the virtual environment (source .venv/bin/activate) in each new terminal session before running VTR tools.

Step 4: Build VTR

make -j$(nproc)

This will build all required tools including VPR, ODIN II, and ABC. The build process may take 10-30 minutes depending on your system.

Step 5: Verify Installation

Run a basic regression test to verify the build:

./vtr_flow/scripts/run_vtr_task.py ./vtr_flow/tasks/regression_tests/vtr_reg_basic/basic_timing

Expected output should show all tests passing with "OK" status.

Architecture Files

The Architecture_Files/ directory contains seven 3D FPGA architecture configurations used in the paper experiments:

Architecture File	Description	Via Strategy	Use Case
`cb_architecture.xml`	Connection-box based vias	Vias placed at connection boxes	Balanced approach for general circuits
`cb_i_architecture.xml`	CB-based with input optimization	Input-optimized via placement	Circuits with high fanin
`cb_o_architecture.xml`	CB-based with output optimization	Output-optimized via placement	Circuits with high fanout
`sb_architecture.xml`	Switch-box based vias	Vias placed at switch boxes	Maximum routing flexibility
`hybrid_architecture.xml`	Hybrid CB+SB approach	Mixed CB and SB via placement	Combined benefits of both strategies
`hybrid_i_architecture.xml`	Hybrid with input optimization	Input-optimized hybrid	High fanin with routing flexibility
`hybrid_o_architecture.xml`	Hybrid with output optimization	Output-optimized hybrid	High fanout with routing flexibility

All architectures are based on the VTR k6_N10 architecture (K=6 LUTs, N=10 BLEs per CLB) with 40nm technology parameters, extended with 3D capabilities and modified DSP/BRAM blocks as described in the paper.

Benchmarks

The Benchmarks/ directory contains 20 machine learning and deep neural network accelerator benchmark circuits:

Benchmark	Description	Type
`attention_layer.blif.tar.gz`	Transformer attention mechanism	ML Layer
`bnn.blif.tar.gz`	Binary Neural Network	Full Network
`clstm_like.small/medium/large.blif.tar.gz`	Convolutional LSTM variants	Sequence Processing
`conv_layer.blif.tar.gz`	Convolution layer	ML Layer
`conv_layer_hls.blif.tar.gz`	HLS-generated convolution	ML Layer
`dla_like.small/medium.blif.tar.gz`	Deep Learning Accelerator variants	DLA Core
`eltwise_layer.blif.tar.gz`	Element-wise operations	ML Layer
`gemm_layer.blif.tar.gz`	General Matrix Multiply	ML Layer
`lstm.blif.tar.gz`	Long Short-Term Memory	Sequence Processing
`reduction_layer.blif.tar.gz`	Reduction operations	ML Layer
`robot_rl.blif.tar.gz`	Reinforcement learning for robotics	RL Application
`softmax.blif.tar.gz`	Softmax activation	ML Layer
`spmv.blif.tar.gz`	Sparse Matrix-Vector Multiply	Linear Algebra
`tpu_like.small/large.os/ws.blif.tar.gz`	TPU-like systolic array variants	Systolic Array

Note: All benchmarks are compressed as .tar.gz files and must be extracted before use (see Running Experiments section).

Running Experiments

Extracting Benchmarks

Before running experiments, extract the benchmark files:

cd Benchmarks
tar -xzf lstm.blif.tar.gz
cd ..

Extract all benchmarks at once:

cd Benchmarks
for f in *.tar.gz; do tar -xzf "$f"; done
cd ..

Basic 3D Placement Run

Run VPR with a 3D architecture:

./build/vpr/vpr Architecture_Files/cb_architecture.xml Benchmarks/lstm.blif \
    --timing_tradeoff_adjustor linear \
    --timing_layer_weight_adjustor linear \
    --rl_agent_move_set prob_layer_swap \
    --seed 1

Reproducing Paper Results

To reproduce the results from Table 3 and Figure 7, use the optimized parameters found in the DAC26_results/ CSV files. For example, to reproduce the LSTM result with CB architecture (seed 5):

./build/vpr/vpr Architecture_Files/cb_architecture.xml Benchmarks/lstm.blif \
    --timing_tradeoff_adjustor linear \
    --timing_layer_weight_adjustor linear \
    --rl_agent_move_set prob_layer_swap \
    --timing_tradeoff_start 0.03 \
    --timing_tradeoff_end 0.51 \
    --timing_tradeoff_start_sr 1.0 \
    --timing_tradeoff_end_sr 0.15 \
    --timing_layer_weight_start 1.6 \
    --timing_layer_weight_end 1.0 \
    --timing_layer_weight_start_sr 0.41 \
    --timing_layer_weight_end_sr 0.32 \
    --partition_post_pack \
    --soft_partitioning init_place \
    --route_chan_width 300 \
    --seed 5

Running with Different Architectures

Test different 3D interconnect strategies:

# Switch-box based
./build/vpr/vpr Architecture_Files/sb_architecture.xml Benchmarks/bnn.blif --seed 1

# Hybrid approach
./build/vpr/vpr Architecture_Files/hybrid_architecture.xml Benchmarks/attention_layer.blif --seed 1

# Input-optimized hybrid
./build/vpr/vpr Architecture_Files/hybrid_i_architecture.xml Benchmarks/gemm_layer.blif --seed 1

Results Files

The DAC26_results/ directory contains experimental results presented in the paper:

Baseline Results

Files prefixed with vtr9_ contain baseline results using vanilla VTR v9.0.0:

vtr9_2d_results.csv: 2D FPGA baseline (no 3D capabilities)
vtr9_3d_cb_results.csv: VTR v9.0.0 with 3D CB architecture
vtr9_3d_cb_i_results.csv: VTR v9.0.0 with 3D CB input-optimized architecture
vtr9_3d_cb_o_results.csv: VTR v9.0.0 with 3D CB output-optimized architecture
vtr9_3d_sb_results.csv: VTR v9.0.0 with 3D SB architecture
vtr9_hybrid_results.csv: VTR v9.0.0 with 3D hybrid architecture
vtr9_hybrid_i_results.csv: VTR v9.0.0 with 3D hybrid input-optimized architecture
vtr9_hybrid_o_results.csv: VTR v9.0.0 with 3D hybrid output-optimized architecture
vtr9_cb_runtime_results.csv: VTR v9.0.0 with 3D CB architecture, with all runtime values

Our Results

Files prefixed with our_ contain results using the proposed 3D-aware placement flow:

our_3d_cb_results.csv: Proposed flow with CB architecture
our_3d_cb_i_results.csv: Proposed flow with CB input-optimized architecture
our_3d_cb_o_results.csv: Proposed flow with CB output-optimized architecture
our_3d_sb_results.csv: Proposed flow with SB architecture
our_hybrid_results.csv: Proposed flow with hybrid architecture
our_hybrid_i_results.csv: Proposed flow with hybrid input-optimized architecture
our_hybrid_o_results.csv: Proposed flow with hybrid output-optimized architecture
our_cb_runtime_results.csv: Proposed flow with CB architecture, with all runtime values

Note: The non-'runtime' result files were generated on a busy shared server where multiple jobs were running simultaneously. As a result, the runtime values in those files may be unreliable due to varying levels of CPU contention. In contrast, the 'runtime' files were produced on an otherwise idle server, with only the experiment workloads running, ensuring that their runtime measurements are consistent and not affected by external system load.

CSV File Format

Each CSV file contains the following columns:

success: Boolean indicating successful completion
return_code: Exit code (0 = success)
seed: Random seed used for the run
blif_file: Benchmark circuit name
cpd: Critical path delay in nanoseconds
wl: Total wirelength
runtime: Total runtime in seconds
placement_runtime: Placement-only runtime in seconds
config: Dictionary of placement parameters used
error: Error message (if any)

Runtime files have the following additional columns:

packing_runtime: Packing time in seconds
load_packing_runtime: Packing loading time in seconds
partitioning_runtime: Time spent by TritonPart in seconds
create_device_runtime: Time spent creating FPGA grid data structure in seconds
router_lookahead_runtime: Time spent to create lookahead delay table in seconds
routing_runtime: Routing time in seconds
analysis_runtime: Analysis time in seconds

Paper Correspondence

Table 3 presents aggregate statistics (mean/median CPD and wirelength) across architectures and benchmarks
Figure 7 shows placement quality convergence during the annealing process for selected benchmarks

Additional Notes for Reviewers

Important Configuration Notes

TritonPart Dependency: While TritonPart integration is included in the code, the partitioning step can be disabled by commenting out the partitioning calls in partition_creator.cpp. The flow will still benefit from the 3D-aware move generators.
Runtime: Large benchmarks (e.g., clstm_like.large, tpu_like.large) may require several hours for full place-and-route. For quick testing, use smaller benchmarks like softmax or attention_layer.
Determinism: Results are deterministic given the same seed value. Multiple seeds were used in experiments to account for placement algorithm randomness.
Memory Usage: Large benchmarks may require 16GB+ RAM. If experiencing out-of-memory errors, try smaller benchmark variants or increase system swap space.

Reproducing Results

For full reproducibility of paper results:

Use the exact parameter configurations from the config column in the results CSVs
Match the seed values (seeds 1-10 were used for most experiments)
Use the same architecture file as indicated by the results filename
Extract timing and wirelength from VPR output or use VTR flow scripts for automated result collection

Tips for Evaluation

Start with smaller benchmarks (softmax, attention_layer) to verify the setup
Compare vtr9_*.csv vs our_*.csv files to see improvements
The placement_runtime values show the efficiency of the placement algorithm
Critical path delay (cpd) improvements of 10-30% are typical for ML workloads

Citation and License

Base VTR Framework

This work builds upon VTR v9.0.0. For more information about the VTR project:

VTR Project: https://verilogtorouting.org/
VTR v9.0.0 Release: https://github.com/verilog-to-routing/vtr-verilog-to-routing/releases/tag/v9.0.0

TritonPart

The partitioning functionality integrates OpenROAD's TritonPart:

OpenROAD Project: https://theopenroadproject.org/
TritonPart: Part of the OpenROAD physical design toolchain

License

This project maintains the VTR MIT License. See LICENSE.md for full details.

The software is provided "as is" without warranty of any kind. All modifications and extensions to VTR are provided under the same MIT License terms.

For questions or issues related to this artifact, please open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 21,657 Commits
.github		.github
Architecture_Files		Architecture_Files
Benchmarks		Benchmarks
DAC26_results		DAC26_results
abc		abc
ace2		ace2
blifexplorer		blifexplorer
cmake		cmake
dev		dev
doc		doc
libs		libs
odin_ii		odin_ii
parmys		parmys
utils		utils
verilog_preprocessor		verilog_preprocessor
vpr		vpr
vtr_flow		vtr_flow
yosys		yosys
.clang-format		.clang-format
.editorconfig		.editorconfig
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.gitpod.Dockerfile		.gitpod.Dockerfile
.gitpod.yml		.gitpod.yml
.readthedocs.yaml		.readthedocs.yaml
BUILDING.md		BUILDING.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.developers.md		README.developers.md
README.md		README.md
SUPPORT.md		SUPPORT.md
install_apt_packages.sh		install_apt_packages.sh
requirements.txt		requirements.txt
run_quick_test.py		run_quick_test.py
run_reg_test.py		run_reg_test.py
sweep_build_configs.py		sweep_build_configs.py
tritonpart_run_script.sh		tritonpart_run_script.sh

Folders and files

Latest commit

History

Repository files navigation

Beyond Flatland: A Placement Flow for 3D FPGAs

Overview

Key Modifications to VTR v9.0.0

1. TritonPart Integration for Layer Assignment

2. 3D-Aware Placement Move Generators

3. Enhanced Timing Optimization

4. 3D Interconnect Architecture Support

Base Framework

Prerequisites

System Requirements

Required Dependencies

Configuration

Building the Tool

Step 1: Clone and Initialize

Step 2: Install System Dependencies

Step 3: Setup Python Environment

Step 4: Build VTR

Step 5: Verify Installation

Architecture Files

Benchmarks

Running Experiments

Extracting Benchmarks

Basic 3D Placement Run

Reproducing Paper Results

Running with Different Architectures

Results Files

Baseline Results

Our Results

CSV File Format

Paper Correspondence

Additional Notes for Reviewers

Important Configuration Notes

Reproducing Results

Tips for Evaluation

Citation and License

Base VTR Framework

TritonPart

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages