Description
Describe the bug
By testing on CPU/GPU devices, it is discovered that
out_wfc_re_im 1
out_pchg 1
out_wfc_norm 1
will fail if run on GPU device.
Output is as follows:
Unexpected Device Error /abacus-develop/source/source_base/module_device/cuda/memory_op.cu:184: cudaErrorIllegalAddress, an illegal memory access was encountered
The output code lies in source/source_esolver/esolver_ks_pw.cpp
:
const std::vector<int> out_wfc_norm = PARAM.inp.out_wfc_norm;
const std::vector<int> out_wfc_re_im = PARAM.inp.out_wfc_re_im;
if (out_wfc_norm.size() > 0 || out_wfc_re_im.size() > 0)
{
ModuleIO::get_wf_pw(out_wfc_norm,
out_wfc_re_im,
this->kspw_psi->get_nbands(),
PARAM.inp.nspin,
this->pw_rhod->nx,
this->pw_rhod->ny,
this->pw_rhod->nz,
this->pw_rhod->nxyz,
&ucell,
this->psi,
this->pw_wfc,
this->ctx,
this->Pgrid,
PARAM.globalv.global_out_dir,
this->kv,
GlobalV::KPAR,
GlobalV::MY_POOL);
}
Investigation shows that
in get_wf_pw
of source/source_io/get_wf_pw.h
:
pw_wfc->recip_to_real(ctx, &psi[0](ib, 0), wfcr_norm.data(), ik);
is attempting to do a recip to real on GPU while both input psi
and output wfcr_norm
are CPU host arrays.
// declared in source/source_esolver/esolver_ks_pw.h
psi::Psi<std::complex<double>, base_device::DEVICE_CPU>* psi = nullptr;
// declared in source/source_io/get_wf_pw.h
std::vector<std::complex<double>> wfcr_norm(nxyz);
- Also note that
out_wfc_pw
works well on GPU.
Expected behavior
Output of GPU wfc should work well as CPU does.
To Reproduce
built with CUDA and TEST on
cmake -B build -DBUILD_TESTING=ON -DUSE_CUDA=ON
running abacus-develop/tests/11_PW_GPU/BUG_PW_outwfcr_GPU
with
out_wfc_norm 1
will yield the above results.
Environment
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
gcc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
Additional Context
No response
Task list for Issue attackers (only for developers)
- Verify the issue is not a duplicate.
- Describe the bug.
- Steps to reproduce.
- Expected behavior.
- Error message.
- Environment details.
- Additional context.
- Assign a priority level (low, medium, high, urgent).
- Assign the issue to a team member.
- Label the issue with relevant tags.
- Identify possible related issues.
- Create a unit test or automated test to reproduce the bug (if applicable).
- Fix the bug.
- Test the fix.
- Update documentation (if necessary).
- Close the issue and inform the reporter (if applicable).