TBLIS 2.0 beta oversubscription causes performance to collapse

On Intel Lunar Lake, when TBLIS 2.0 beta oversubscribes threads, causes the performance to collapse. Profiling shows most execution time stuck in BLIS’ `bli_thrcomm_barrier_atomic`.

TBLIS 1.3 does not suffer from the same issue. 

In both cases:
```
-- mutex type selected: pthread_spinlock
-- barrier type selected: spin_barrier
-- thread model selected: openmp
```
And haswell kernel was selected.

File to reproduce: https://github.com/DiamonDinoia/nda/blob/nda-tensor-merge-master/benchmarks/tensor_contract_tblis.cpp


PS: my educated guess is that the spinlock sits on the same cache line as some data causing false sharing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TBLIS 2.0 beta oversubscription causes performance to collapse #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TBLIS 2.0 beta oversubscription causes performance to collapse #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions