-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Circuit segment fault when using the newinsts #893
Comments
Pushed a fix to master. Assigning back to @eddy16112 to confirm and close. |
I am sorry to say the bug is not fixed. I can still get the same error, and backtrace is also the same as before. |
Are you sure you are running on the right commit? You rebuilt from scratch? Your reproducer is no longer reproducing for me. |
You should be on this commit: https://gitlab.com/StanfordLegion/legion/-/commit/17adf5a87e3f10c019424a2f64ea0dbd03e14e11 |
Yes, I am on the right commit. Can you try to increase -np to 4 or 5 or set -ll:gpuworkthread 1 to see if you can catch it? |
Please pull and try again. |
So far so good! Thanks, I will close the issue now. |
I noticed a bug in the master after merging the newinsts, when running circuit like this
mpirun -x LEGION_FREEZE_ON_ERROR=1 -np 3 ./circuit -ll:gpu 1 -wpp 65536 -npp 65536
I am running on single node with REALM_NETWORKS=mpi
Here is the backtrace:
#2 0x0000000002b9abc1 in Realm::realm_freeze (signal=11) at /home/wwu/legion-newinsts/runtime/realm/runtime_impl.cc:138
#3
#4 Realm::Rect<1, long long>::contains (this=0x10, p=...) at /home/wwu/legion-newinsts/runtime/realm/point.inl:512
#5 0x0000000001c9ccbb in Realm::InstancePieceList<1, long long>::find_piece (this=0x2b59983c7e90, p=...) at /home/wwu/legion-newinsts/runtime/realm/inst_layout.inl:392
#6 0x0000000002d8f30e in Realm::TransferIteratorBase<1, long long>::step (this=0x2b59904c3350, max_bytes=4, info=..., flags=1536, tentative=true) at /home/wwu/legion-newinsts/runtime/realm/transfer/transfer.cc:198
#7 0x0000000002da4f57 in Realm::XferDes::default_get_requests (this=0x2b59904c3650, reqs=0x2b58ec501b48, nr=1, flags=1542) at /home/wwu/legion-newinsts/runtime/realm/transfer/channel.cc:894
#8 0x0000000002da8979 in Realm::GPUXferDes::get_requests (this=0x2b59904c3650, requests=0x2b58ec501b48, nr=1) at /home/wwu/legion-newinsts/runtime/realm/transfer/channel.cc:2013
#9 0x0000000002da8c02 in Realm::GPUXferDes::progress_xd (this=0x2b59904c3650, channel=0x88d6ab0, work_until=...) at /home/wwu/legion-newinsts/runtime/realm/transfer/channel.cc:2061
#10 0x0000000002dbb63c in Realm::XDQueue<Realm::GPUChannel, Realm::GPUXferDes>::do_work (this=0x88d6ad8, work_until=...) at /home/wwu/legion-newinsts/runtime/realm/transfer/channel.inl:303
#11 0x0000000002bd3298 in Realm::BackgroundWorkManager::Worker::do_work (this=0x2b58ec502130, max_time_in_ns=-1, interrupt_flag=0x0) at /home/wwu/legion-newinsts/runtime/realm/bgwork.cc:536
#12 0x0000000002bd1368 in Realm::BackgroundWorkThread::main_loop (this=0x52035f0) at /home/wwu/legion-newinsts/runtime/realm/bgwork.cc:135
#13 0x0000000002bd451c in Realm::Thread::thread_entry_wrapper<Realm::BackgroundWorkThread, &Realm::BackgroundWorkThread::main_loop> (obj=0x52035f0) at /home/wwu/legion-newinsts/runtime/realm/threads.inl:97
#14 0x0000000002df0268 in Realm::KernelThread::pthread_entry (data=0x5203690) at /home/wwu/legion-newinsts/runtime/realm/threads.cc:729
Sean thought it might be a Legion bug - instances are being deleted before copies that target them are completed.
The text was updated successfully, but these errors were encountered: