Skip to content

Commit 2ed1579

Browse files
committed
load spack
1 parent e9b3cef commit 2ed1579

File tree

2 files changed

+47
-42
lines changed

2 files changed

+47
-42
lines changed

.github/workflows/ci.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,12 @@ jobs:
2323
GITHUB_OAUTH: ${{ secrets.CR_PAT_WORKFLOW }}
2424
- name: Build
2525
run: |
26+
if [ -f /data/cemosis/spack/share/spack/setup-env.sh ]; then
27+
source /data/cemosis/spack/share/spack/setup-env.sh
28+
spacktivate feelpp
29+
else
30+
echo "Spack environment setup script not found."
31+
fi
2632
npm install
2733
npm run antora
2834
working-directory: docs

docs/modules/kokkos/pages/advanced-concepts/hierarchical-parallelism.adoc

Lines changed: 41 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,16 @@
33
== Introduction
44

55
[.text-justify]
6-
Kokkos' *hierarchical parallelism* is a paradigm that enables the exploitation of multiple levels of *shared-memory parallelism*, allowing developers to leverage increased parallelism in their computations for potential performance improvements. This framework supports various levels of parallelism, including thread teams, threads within a team, and vector lanes, which can be nested to create complex parallel structures.
6+
Kokkos' *hierarchical parallelism* is a paradigm that enables the exploitation of multiple levels of *shared-memory parallelism*, allowing developers to leverage increased parallelism in their computations for potential performance improvements. This framework supports various levels of parallelism, including thread teams, threads within a team, and vector lanes, which can be nested to create complex parallel structures [1][2][6].
77

88
[.text-justify]
99
The paradigm employs a two-tiered approach: an outer level, often implemented using a league of teams, which divides the overall workload into larger chunks, and an inner level, typically comprising threads within a team, which focuses on finer-grained parallelism within these chunks. *Thread teams*, a fundamental concept in Kokkos, represent collections of threads that can synchronize and share a common scratch pad memory.
1010

1111

12-
== hierarchical parallelism
12+
== Hierarchical parallelism
1313

14-
[.text-justify]
15-
At the heart of Kokkos' *hierarchical parallelism* lies the ability to exploit multiple levels of *shared-memory parallelism*.
16-
This approach allows developers to map complex algorithms to the hierarchical nature of modern hardware, from multi-core CPUs to many-core GPUs and leverage more parallelism in their computations, potentially leading to significant performance improvements. The framework supports various levels of parallelism, including thread teams, threads within a team, and vector lanes, which can be nested to create complex parallel structures [1][2][6].
14+
At the heart of Kokkos' *hierarchical parallelism* lies the ability to exploit multiple levels of *shared-memory parallelism*.
15+
This approach allows developers to map complex algorithms to the hierarchical nature of modern hardware, from multi-core CPUs to many-core GPUs and leverage more parallelism in their computations, potentially leading to significant performance improvements. The framework supports various levels of parallelism, including thread teams, threads within a team, and vector lanes, which can be nested to create complex parallel structures .
1716

1817
*Similarities and Differences Between Outer and Inner Levels of Parallelism*
1918

@@ -50,43 +49,43 @@ Well-coordinated teams can significantly boost performance by:
5049

5150
[source, c++]
5251
----
53-
struct HierarchicalParallelism {
54-
Kokkos::View<double**> matrix;
55-
HierarchicalParallelism(int N, int M) : matrix("matrix", N, M) {}
56-
KOKKOS_INLINE_FUNCTION
57-
void operator()(const Kokkos::TeamPolicy<>::member_type& team_member) const {
58-
const int i = team_member.league_rank();
59-
Kokkos::parallel_for(Kokkos::TeamThreadRange(team_member, matrix.extent(1)),
60-
[&] (const int j) {
61-
matrix(i, j) = i * matrix.extent(1) + j;
62-
});
63-
64-
team_member.team_barrier();
65-
if (team_member.team_rank() == 0) {
66-
double sum = 0.0;
67-
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team_member, matrix.extent(1)),
68-
[&] (const int j, double& lsum) {
69-
lsum += matrix(i, j);
70-
}, sum);
71-
72-
Kokkos::single(Kokkos::PerTeam(team_member), [&] () {
73-
matrix(i, 0) = sum;
74-
});
75-
}
76-
}
77-
};
78-
79-
int main(int argc, char* argv[]) {
80-
Kokkos::initialize(argc, argv);
81-
{
82-
const int N = 1000;
83-
const int M = 100;
84-
HierarchicalParallelism functor(N, M);
85-
Kokkos::parallel_for(Kokkos::TeamPolicy<>(N, Kokkos::AUTO), functor);
86-
}
87-
Kokkos::finalize();
88-
return 0
52+
struct HierarchicalParallelism {
53+
Kokkos::View<double**> matrix;
54+
HierarchicalParallelism(int N, int M) : matrix("matrix", N, M) {}
55+
KOKKOS_INLINE_FUNCTION
56+
void operator()(const Kokkos::TeamPolicy<>::member_type& team_member) const {
57+
const int i = team_member.league_rank();
58+
Kokkos::parallel_for(Kokkos::TeamThreadRange(team_member, matrix.extent(1)),
59+
[&] (const int j) {
60+
matrix(i, j) = i * matrix.extent(1) + j;
61+
});
62+
63+
team_member.team_barrier();
64+
if (team_member.team_rank() == 0) {
65+
double sum = 0.0;
66+
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team_member, matrix.extent(1)),
67+
[&] (const int j, double& lsum) {
68+
lsum += matrix(i, j);
69+
}, sum);
70+
71+
Kokkos::single(Kokkos::PerTeam(team_member), [&] () {
72+
matrix(i, 0) = sum;
73+
});
8974
}
75+
}
76+
};
77+
78+
int main(int argc, char* argv[]) {
79+
Kokkos::initialize(argc, argv);
80+
{
81+
const int N = 1000;
82+
const int M = 100;
83+
HierarchicalParallelism functor(N, M);
84+
Kokkos::parallel_for(Kokkos::TeamPolicy<>(N, Kokkos::AUTO), functor);
85+
}
86+
Kokkos::finalize();
87+
return 0
88+
}
9089
----
9190

9291
Hierarchical parallelism is implemented as follows:
@@ -236,7 +235,7 @@ Explanations:
236235
*** Scratch Memory can be use with the TeamPolicy to provide thread or team private memory.
237236
*** Scratch memory exposes on-chip user managed caches (e.g. on NVIDIA GPUs)
238237
*** The size must be determined before launching a kernel.
239-
*** Two levels are available: large/slow and small/fast.
238+
*** Two levels are available: large/slow and small/fast.
240239
241240
242241
* *Tocken*

0 commit comments

Comments
 (0)