Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion `finder != proc_spaces.end()' failed. #305

Closed
jiazhihao opened this issue Oct 1, 2017 · 7 comments
Closed

Assertion `finder != proc_spaces.end()' failed. #305

jiazhihao opened this issue Oct 1, 2017 · 7 comments

Comments

@jiazhihao
Copy link
Contributor

jiazhihao commented Oct 1, 2017

I encountered the following assertion failure in the up-to-date master branch. The failure can be deterministically reproduced on n0000 under the following steps:

zhihao@n0000: cd /home/zhihao/legion/apps/cnn/
zhihao@n0000:/legion/apps/cnn$ ./cnn -ll:gpu 2 -ll:fsize 4000
cnn: /home/zhihao/legion/runtime//legion/runtime.cc:12348: Legion::AddressSpaceID Legion::Internal::Runtime::find_address_space(Legion::Processor) const: Assertion `finder != proc_spaces.end()' failed.

@streichler
Copy link
Contributor

@jiazhihao remind me where this app code is? Also, can you generate a backtrace?

@jiazhihao
Copy link
Contributor Author

Attached please find the backtrace. The code are available at sapling:~/home/zhihao/legion/apps/cnn. You can compile the code in that sub folder.

[0] Thread 16 (Thread 0x7f91facd8700 (LWP 30156)):
[0] #0 0x00007f91fd896fdd in poll () at ../sysdeps/unix/syscall-template.S:81
[0] #1 0x00007f91fcfbe196 in poll_dispatch () from /usr/local/openmpi-1.8.2/lib/libopen-pal.so.6
[0] #2 0x00007f91fcfb54fb in opal_libevent2021_event_base_loop () from /usr/local/openmpi-1.8.2/lib/libopen-pal.so.6
[0] #3 0x00007f91fd26096e in orte_progress_thread_engine () from /usr/local/openmpi-1.8.2/lib/libopen-rte.so.7
[0] #4 0x00007f91ff830184 in start_thread (arg=0x7f91facd8700) at pthread_create.c:312
[0] #5 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 15 (Thread 0x7f91f7108700 (LWP 30157)):
[0] #0 0x00007f91fd89bc53 in select () at ../sysdeps/unix/syscall-template.S:81
[0] #1 0x00007f91f94ab856 in service_thread_start () from /usr/local/openmpi-1.8.2/lib/openmpi/mca_btl_openib.so
[0] #2 0x00007f91ff830184 in start_thread (arg=0x7f91f7108700) at pthread_create.c:312
[0] #3 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 14 (Thread 0x7f91f6700700 (LWP 30158)):
[0] #0 0x00007f91fd896fdd in poll () at ../sysdeps/unix/syscall-template.S:81
[0] #1 0x00007f91f94aa5f5 in btl_openib_async_thread () from /usr/local/openmpi-1.8.2/lib/openmpi/mca_btl_openib.so
[0] #2 0x00007f91ff830184 in start_thread (arg=0x7f91f6700700) at pthread_create.c:312
[0] #3 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 13 (Thread 0x7f91d1aff700 (LWP 30160)):
[0] #0 0x00007f91fd896fdd in poll () at ../sysdeps/unix/syscall-template.S:81
[0] #1 0x00007f91fefc8179 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
[0] #2 0x00007f91fe9d0582 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
[0] #3 0x00007f91fefc8808 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
[0] #4 0x00007f91ff830184 in start_thread (arg=0x7f91d1aff700) at pthread_create.c:312
[0] #5 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 12 (Thread 0x7f91d06fe700 (LWP 30161)):
[0] #0 0x00007f91fd896fdd in poll () at ../sysdeps/unix/syscall-template.S:81
[0] #1 0x00007f91fefc8179 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
[0] #2 0x00007f91fe9d0582 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
[0] #3 0x00007f91fefc8808 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
[0] #4 0x00007f91ff830184 in start_thread (arg=0x7f91d06fe700) at pthread_create.c:312
[0] #5 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 11 (Thread 0x7f91aebfe700 (LWP 30162)):
[0] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1 0x0000000002555fc0 in IncomingMessageManager::get_messages (this=0x37d1fc0, sender=@0x7f91aebfddfc: -1, wait=true) at /home/zhihao/legion/runtime//activemsg.cc:940
[0] #2 0x000000000255613e in IncomingMessageManager::handler_thread_loop (this=0x37d1fc0) at /home/zhihao/legion/runtime//activemsg.cc:982
[0] #3 0x000000000255ec8c in Realm::Thread::thread_entry_wrapper<IncomingMessageManager, &IncomingMessageManager::handler_thread_loop> (obj=0x37d1fc0) at /home/zhihao/legion/runtime//realm/threads.inl:131
[0] #4 0x0000000001f0ec0b in Realm::KernelThread::pthread_entry (data=0x37d2340) at /home/zhihao/legion/runtime//realm/threads.cc:692
[0] #5 0x00007f91ff830184 in start_thread (arg=0x7f91aebfe700) at pthread_create.c:312
[0] #6 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 10 (Thread 0x7f91ae9fd700 (LWP 30163)):
[0] #0 0x0000000002589970 in gasnetc_AMPoll ()
[0] #1 0x00000000024c530d in gasneti_AMPoll () at /usr/local/gasnet-1.22.4-openmpi/include/gasnet_help.h:597
[0] #2 0x000000000255801f in gasnet_AMPoll () at /usr/local/gasnet-1.22.4-openmpi/include/gasnet_help.h:712
[0] #3 0x0000000002557396 in EndpointManager::polling_worker_loop (this=0x37c5680) at /home/zhihao/legion/runtime//activemsg.cc:2451
[0] #4 0x000000000255f058 in Realm::Thread::thread_entry_wrapper<EndpointManager, &EndpointManager::polling_worker_loop> (obj=0x37c5680) at /home/zhihao/legion/runtime//realm/threads.inl:131
[0] #5 0x0000000001f0ec0b in Realm::KernelThread::pthread_entry (data=0x37d7020) at /home/zhihao/legion/runtime//realm/threads.cc:692
[0] #6 0x00007f91ff830184 in start_thread (arg=0x7f91ae9fd700) at pthread_create.c:312
[0] #7 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 9 (Thread 0x7f91ae1fc700 (LWP 30164)):
[0] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1 0x0000000001e89480 in GASNetCondVar::wait (this=0x383c4a8) at /home/zhihao/legion/runtime/activemsg.h:202
[0] #2 0x0000000001eea7ef in Realm::DmaRequestQueue::dequeue_request (this=0x383c480, sleep=true) at /home/zhihao/legion/runtime//realm/transfer/lowlevel_dma.cc:334
[0] #3 0x0000000001ef31b2 in Realm::DmaRequestQueue::worker_thread_loop (this=0x383c480) at /home/zhihao/legion/runtime//realm/transfer/lowlevel_dma.cc:2406
[0] #4 0x0000000001efa228 in Realm::Thread::thread_entry_wrapper<Realm::DmaRequestQueue, &Realm::DmaRequestQueue::worker_thread_loop> (obj=0x383c480) at /home/zhihao/legion/runtime/realm/threads.inl:131
[0] #5 0x0000000001f0ec0b in Realm::KernelThread::pthread_entry (data=0x383c720) at /home/zhihao/legion/runtime//realm/threads.cc:692
[0] #6 0x00007f91ff830184 in start_thread (arg=0x7f91ae1fc700) at pthread_create.c:312
[0] #7 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 8 (Thread 0x7f91ad9fb700 (LWP 30165)):
[0] #0 0x00007f91fd8752a7 in sched_yield () at ../sysdeps/unix/syscall-template.S:81
[0] #1 0x0000000001f1006b in Realm::Thread::yield () at /home/zhihao/legion/runtime//realm/threads.cc:1081
[0] #2 0x0000000001f41d5c in Realm::PartitioningOpQueue::worker_thread_loop (this=0x383cc40) at /home/zhihao/legion/runtime//realm/deppart/partitions.cc:864
[0] #3 0x0000000001f59b9c in Realm::Thread::thread_entry_wrapper<Realm::PartitioningOpQueue, &Realm::PartitioningOpQueue::worker_thread_loop> (obj=0x383cc40) at /home/zhihao/legion/runtime//realm/threads.inl:131
[0] #4 0x0000000001f0ec0b in Realm::KernelThread::pthread_entry (data=0x383cdc0) at /home/zhihao/legion/runtime//realm/threads.cc:692
[0] #5 0x00007f91ff830184 in start_thread (arg=0x7f91ad9fb700) at pthread_create.c:312
[0] #6 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 7 (Thread 0x7f91ad1fa700 (LWP 30166)):
[0] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1 0x0000000001e89480 in GASNetCondVar::wait (this=0x383d210) at /home/zhihao/legion/runtime/activemsg.h:202
[0] #2 0x000000000253961a in Realm::Cuda::GPUWorker::process_streams (this=0x383d1e0, sleep_on_empty=true) at /home/zhihao/legion/runtime//realm/cuda/cuda_module.cc:1467
[0] #3 0x0000000002539777 in Realm::Cuda::GPUWorker::thread_main (this=0x383d1e0) at /home/zhihao/legion/runtime//realm/cuda/cuda_module.cc:1499
[0] #4 0x0000000002543718 in Realm::Thread::thread_entry_wrapper<Realm::Cuda::GPUWorker, &Realm::Cuda::GPUWorker::thread_main> (obj=0x383d1e0) at /home/zhihao/legion/runtime/realm/threads.inl:131
[0] #5 0x0000000001f0ec0b in Realm::KernelThread::pthread_entry (data=0x383d480) at /home/zhihao/legion/runtime//realm/threads.cc:692
[0] #6 0x00007f91ff830184 in start_thread (arg=0x7f91ad1fa700) at pthread_create.c:312
[0] #7 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 6 (Thread 0x7f91ac9f9700 (LWP 30167)):
[0] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1 0x0000000001e89480 in GASNetCondVar::wait (this=0x7f91cec1fe98) at /home/zhihao/legion/runtime/activemsg.h:202
[0] #2 0x0000000001f2d85f in Realm::ThreadedTaskScheduler::WorkCounter::wait_for_work (this=0x7f91cec1fe60, old_counter=1) at /home/zhihao/legion/runtime//realm/tasks.cc:303
[0] #3 0x0000000001f2eb56 in Realm::ThreadedTaskScheduler::wait_for_work (this=0x7f91cec1fd20, old_work_counter=1) at /home/zhihao/legion/runtime//realm/tasks.cc:684
[0] #4 0x0000000001f2f8a7 in Realm::KernelThreadTaskScheduler::wait_for_work (this=0x7f91cec1fd20, old_work_counter=1) at /home/zhihao/legion/runtime//realm/tasks.cc:906
[0] #5 0x0000000001f2ea8a in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x7f91cec1fd20) at /home/zhihao/legion/runtime//realm/tasks.cc:660
[0] #6 0x0000000001f2eacc in Realm::ThreadedTaskScheduler::scheduler_loop_wlock (this=0x7f91cec1fd20) at /home/zhihao/legion/runtime//realm/tasks.cc:672
[0] #7 0x0000000001f342aa in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock> (obj=0x7f91cec1fd20) at /home/zhihao/legion/runtime//realm/threads.inl:131
[0] #8 0x0000000001f0ec0b in Realm::KernelThread::pthread_entry (data=0x7f91cec201c0) at /home/zhihao/legion/runtime//realm/threads.cc:692
[0] #9 0x00007f91ff830184 in start_thread (arg=0x7f91ac9f9700) at pthread_create.c:312
[0] #10 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 5 (Thread 0x7f91ac7f8700 (LWP 30168)):
[0] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1 0x0000000001e89480 in GASNetCondVar::wait (this=0x7f91cec21e78) at /home/zhihao/legion/runtime/activemsg.h:202
[0] #2 0x0000000001f2d85f in Realm::ThreadedTaskScheduler::WorkCounter::wait_for_work (this=0x7f91cec21e40, old_counter=1) at /home/zhihao/legion/runtime//realm/tasks.cc:303
[0] #3 0x0000000001f2eb56 in Realm::ThreadedTaskScheduler::wait_for_work (this=0x7f91cec21d00, old_work_counter=1) at /home/zhihao/legion/runtime//realm/tasks.cc:684
[0] #4 0x0000000001f2f8a7 in Realm::KernelThreadTaskScheduler::wait_for_work (this=0x7f91cec21d00, old_work_counter=1) at /home/zhihao/legion/runtime//realm/tasks.cc:906
[0] #5 0x0000000001f2ea8a in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x7f91cec21d00) at /home/zhihao/legion/runtime//realm/tasks.cc:660
[0] #6 0x0000000001f2eacc in Realm::ThreadedTaskScheduler::scheduler_loop_wlock (this=0x7f91cec21d00) at /home/zhihao/legion/runtime//realm/tasks.cc:672
[0] #7 0x0000000001f342aa in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock> (obj=0x7f91cec21d00) at /home/zhihao/legion/runtime//realm/threads.inl:131
[0] #8 0x0000000001f0ec0b in Realm::KernelThread::pthread_entry (data=0x7f91cec22080) at /home/zhihao/legion/runtime//realm/threads.cc:692
[0] #9 0x00007f91ff830184 in start_thread (arg=0x7f91ac7f8700) at pthread_create.c:312
[0] #10 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 4 (Thread 0x7f91ac5f7700 (LWP 30169)):
[0] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1 0x0000000001ed62d8 in Realm::XferDesQueue::dequeue_xferDes (this=0x7f91cec299e0, dma_thread=0x7f91cec2c880, wait_on_empty=true) at /home/zhihao/legion/runtime//realm/transfer/channel.h:1306
[0] #2 0x0000000001ed2112 in Realm::DMAThread::dma_thread_loop (this=0x7f91cec2c880) at /home/zhihao/legion/runtime//realm/transfer/channel.cc:2392
[0] #3 0x0000000001edb714 in Realm::Thread::thread_entry_wrapper<Realm::DMAThread, &Realm::DMAThread::dma_thread_loop> (obj=0x7f91cec2c880) at /home/zhihao/legion/runtime/realm/threads.inl:131
[0] #4 0x0000000001f0ec0b in Realm::KernelThread::pthread_entry (data=0x7f91cec2f940) at /home/zhihao/legion/runtime//realm/threads.cc:692
[0] #5 0x00007f91ff830184 in start_thread (arg=0x7f91ac5f7700) at pthread_create.c:312
[0] #6 0x00007f91fd8a437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 3 (Thread 0x7f91cef91700 (LWP 30170)):
[0] #0 0x00007f91fd86aa59 in __libc_waitpid (pid=30172, stat_loc=stat_loc@entry=0x7f91abdf0d90, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
[0] #1 0x00007f91fd7f0232 in do_system (line=) at ../sysdeps/posix/system.c:148
[0] #2 0x0000000002596b88 in gasneti_bt_gdb ()
[0] #3 0x00000000025999eb in gasneti_print_backtrace ()
[0] #4 0x00000000025f2521 in gasneti_defaultSignalHandler ()
[0] #5
[0] #6 0x00007f91fd7e0c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
[0] #7 0x00007f91fd7e4028 in __GI_abort () at abort.c:89
[0] #8 0x00007f91fd7d9bf6 in __assert_fail_base (fmt=0x7f91fd92a3b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x26c706c "finder != proc_spaces.end()", file=file@entry=0x26c1bb0 "/home/zhihao/legion/runtime//legion/runtime.cc", line=line@entry=12348, function=function@entry=0x26d02e0 <Legion::Internal::Runtime::find_address_space(Realm::Processor) const::PRETTY_FUNCTION> "Legion::AddressSpaceID Legion::Internal::Runtime::find_address_space(Legion::Processor) const") at assert.c:92
[0] #9 0x00007f91fd7d9ca2 in __GI___assert_fail (assertion=0x26c706c "finder != proc_spaces.end()", file=0x26c1bb0 "/home/zhihao/legion/runtime//legion/runtime.cc", line=12348, function=0x26d02e0 <Legion::Internal::Runtime::find_address_space(Realm::Processor) const::PRETTY_FUNCTION> "Legion::AddressSpaceID Legion::Internal::Runtime::find_address_space(Legion::Processor) const") at assert.c:101
[0] #10 0x0000000001d6a210 in Legion::Internal::Runtime::find_address_space (this=0x3aeb100, target=...) at /home/zhihao/legion/runtime//legion/runtime.cc:12348
[0] #11 0x0000000001d6a189 in Legion::Internal::Runtime::find_messenger (this=0x3aeb100, target=...) at /home/zhihao/legion/runtime//legion/runtime.cc:12338
[0] #12 0x0000000001d6a8f2 in Legion::Internal::Runtime::send_task (this=0x3aeb100, task=0x3b236e0) at /home/zhihao/legion/runtime//legion/runtime.cc:12454
[0] #13 0x000000000191d21a in Legion::Internal::MultiTask::trigger_slices (this=0x3b21660) at /home/zhihao/legion/runtime//legion/legion_tasks.cc:4203
[0] #14 0x000000000191cfb1 in Legion::Internal::MultiTask::slice_index_space (this=0x3b21660) at /home/zhihao/legion/runtime//legion/legion_tasks.cc:4160
[0] #15 0x000000000191d8fb in Legion::Internal::MultiTask::trigger_mapping (this=0x3b21660) at /home/zhihao/legion/runtime//legion/legion_tasks.cc:4354
[0] #16 0x0000000001d7f241 in Legion::Internal::Runtime::legion_runtime_task (args=0x3a38ba0, arglen=12, userdata=0x0, userlen=0, p=...) at /home/zhihao/legion/runtime//legion/runtime.cc:19441
[0] #17 0x00000000024ba325 in Realm::LocalTaskProcessor::execute_task (this=0x7f91ceee4240, func_id=4, task_args=...) at /home/zhihao/legion/runtime//realm/proc_impl.cc:915
[0] #18 0x0000000001f2d57a in Realm::Task::execute_on_processor (this=0x3b21de0, p=...) at /home/zhihao/legion/runtime//realm/tasks.cc:167
[0] #19 0x0000000001f300fe in Realm::UserThreadTaskScheduler::execute_task (this=0x7f91ceee4460, task=0x3b21de0) at /home/zhihao/legion/runtime//realm/tasks.cc:1059
[0] #20 0x0000000001f2e773 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x7f91ceee4460) at /home/zhihao/legion/runtime//realm/tasks.cc:591
[0] #21 0x0000000001f3451c in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x7f91ceee4460) at /home/zhihao/legion/runtime//realm/threads.inl:131
[0] #22 0x0000000001f0f5ea in Realm::UserThread::uthread_entry () at /home/zhihao/legion/runtime//realm/threads.cc:910
[0] #23 0x00007f91fd7f3800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[0] #24 0x0000000000000000 in ?? ()
[0]
[0] Thread 2 (Thread 0x7f91cef50700 (LWP 30171)):
[0] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1 0x0000000001e89480 in GASNetCondVar::wait (this=0x7f91ceee52f8) at /home/zhihao/legion/runtime/activemsg.h:202
[0] #2 0x0000000001f2d85f in Realm::ThreadedTaskScheduler::WorkCounter::wait_for_work (this=0x7f91ceee52c0, old_counter=5) at /home/zhihao/legion/runtime//realm/tasks.cc:303
[0] #3 0x0000000001f2eb56 in Realm::ThreadedTaskScheduler::wait_for_work (this=0x7f91ceee5180, old_work_counter=5) at /home/zhihao/legion/runtime//realm/tasks.cc:684
[0] #4 0x0000000001f30381 in Realm::UserThreadTaskScheduler::wait_for_work (this=0x7f91ceee5180, old_work_counter=5) at /home/zhihao/legion/runtime//realm/tasks.cc:1161
[0] #5 0x0000000001f2ea8a in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x7f91ceee5180) at /home/zhihao/legion/runtime//realm/tasks.cc:660
[0] #6 0x0000000001f3451c in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x7f91ceee5180) at /home/zhihao/legion/runtime//realm/threads.inl:131
[0] #7 0x0000000001f0f5ea in Realm::UserThread::uthread_entry () at /home/zhihao/legion/runtime//realm/threads.cc:910
[0] #8 0x00007f91fd7f3800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[0] #9 0x0000000000000000 in ?? ()
[0]
[0] Thread 1 (Thread 0x7f92036bc7c0 (LWP 30155)):
[0] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1 0x0000000001e89480 in GASNetCondVar::wait (this=0x3651758) at /home/zhihao/legion/runtime/activemsg.h:202
[0] #2 0x0000000001e86d25 in Realm::RuntimeImpl::wait_for_shutdown (this=0x3651590) at /home/zhihao/legion/runtime//realm/runtime_impl.cc:2051
[0] #3 0x0000000001e7f0a0 in Realm::Runtime::wait_for_shutdown (this=0x7fff91f14350) at /home/zhihao/legion/runtime//realm/runtime_impl.cc:394
[0] #4 0x0000000001d7c1c7 in Legion::Internal::Runtime::start (argc=5, argv=0x7fff91f15678, background=false) at /home/zhihao/legion/runtime//legion/runtime.cc:18581
[0] #5 0x000000000185db77 in Legion::Runtime::start (argc=5, argv=0x7fff91f15678, background=false) at /home/zhihao/legion/runtime//legion/legion.cc:6652
[0] #6 0x0000000001823f63 in main (argc=5, argv=0x7fff91f15678) at cnn.cc:247

@lightsighter
Copy link
Contributor

This is an application bug. The mapper is returning an invalid target processor ID for one of the slices from a call to 'slice_task':

proc = {
id = 0x300000028
},

I will fix the runtime to actually check for bad processor IDs, but the bug is going to be in your custom mapper.

@lightsighter
Copy link
Contributor

Just looking at the code in your mapper it looks like you need to replace this:

gpus[idx]

with this:

gpus[idx % gpus.size()]

@lightsighter
Copy link
Contributor

Speaking of which, @streichler how hard would it be to add an interface to all the different kinds of realm types that support the 'exists' method to also support a 'valid' method?

@jiazhihao
Copy link
Contributor Author

I see. That's why I cannot reproduce the failure if the application is given enough GPUs.

@jiazhihao
Copy link
Contributor Author

Closing this since it is an application bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants