Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in index space launcher #94

Closed
alex-lee678294 opened this issue Nov 3, 2015 · 11 comments
Closed

Error in index space launcher #94

alex-lee678294 opened this issue Nov 3, 2015 · 11 comments
Assignees

Comments

@alex-lee678294
Copy link

This is the error I get when I added the index space launch, could you please tell me what's wrong with this? Thanks a lot!

main: /home/xyyue/legion/runtime/legion/legion_tasks.cc:8818: virtual void LegionRuntime::HighLevel::IndexTask::trigger_dependence_analysis(): Assertion `rerun_analysis_requirements.empty()' failed.
*** Caught a fatal signal: SIGABRT(6) on node 0/1
[0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_UkxE5e '/local/home/xyyue/cg_codes/./main' 29205
[0] [New LWP 29213]
[0] [New LWP 29212]
[0] [New LWP 29211]
[0] [New LWP 29210]
[0] [New LWP 29209]
[0] [New LWP 29208]
[0] [New LWP 29207]
[0] [New LWP 29206]
[0] [Thread debugging using libthread_db enabled]
[0] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[0] pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0]   Id   Target Id         Frame
[0]   9    Thread 0x7f9f02d1a700 (LWP 29206) "main" 0x00007f9f057d812d in poll () at ../sysdeps/unix/syscall-template.S:81
[0]   8    Thread 0x7f9eff14a700 (LWP 29207) "main" 0x00007f9f057dcda3 in select () at ../sysdeps/unix/syscall-template.S:81
[0]   7    Thread 0x7f9efe539700 (LWP 29208) "main" 0x00007f9f057d812d in poll () at ../sysdeps/unix/syscall-template.S:81
[0]   6    Thread 0x7f9ec9937700 (LWP 29209) "main" 0x0000000000d67230 in gasnetc_snd_reap.constprop ()
[0]   5    Thread 0x7f9f003e8700 (LWP 29210) "main" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0]   4    Thread 0x7f9ec9136700 (LWP 29211) "main" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0]   3    Thread 0x7f9f045d9700 (LWP 29212) "main" 0x00007f9f057abb99 in __libc_waitpid (pid=29233, stat_loc=stat_loc@entry=0x7f9ec83fb150, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
[0]   2    Thread 0x7f9f045d5700 (LWP 29213) "main" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] * 1    Thread 0x7f9f06ddd7c0 (LWP 29205) "main" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0]
[0] Thread 9 (Thread 0x7f9f02d1a700 (LWP 29206)):
[0] #0  0x00007f9f057d812d in poll () at ../sysdeps/unix/syscall-template.S:81
[0] #1  0x00007f9f05001196 in poll_dispatch () from /usr/local/openmpi-1.8.2/lib/libopen-pal.so.6
[0] #2  0x00007f9f04ff84fb in opal_libevent2021_event_base_loop () from /usr/local/openmpi-1.8.2/lib/libopen-pal.so.6
[0] #3  0x00007f9f052a396e in orte_progress_thread_engine () from /usr/local/openmpi-1.8.2/lib/libopen-rte.so.7
[0] #4  0x00007f9f067c0182 in start_thread (arg=0x7f9f02d1a700) at pthread_create.c:312
[0] #5  0x00007f9f057e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 8 (Thread 0x7f9eff14a700 (LWP 29207)):
[0] #0  0x00007f9f057dcda3 in select () at ../sysdeps/unix/syscall-template.S:81
[0] #1  0x00007f9f014ed856 in service_thread_start () from /usr/local/openmpi-1.8.2/lib/openmpi/mca_btl_openib.so
[0] #2  0x00007f9f067c0182 in start_thread (arg=0x7f9eff14a700) at pthread_create.c:312
[0] #3  0x00007f9f057e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 7 (Thread 0x7f9efe539700 (LWP 29208)):
[0] #0  0x00007f9f057d812d in poll () at ../sysdeps/unix/syscall-template.S:81
[0] #1  0x00007f9f014ec5f5 in btl_openib_async_thread () from /usr/local/openmpi-1.8.2/lib/openmpi/mca_btl_openib.so
[0] #2  0x00007f9f067c0182 in start_thread (arg=0x7f9efe539700) at pthread_create.c:312
[0] #3  0x00007f9f057e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 6 (Thread 0x7f9ec9937700 (LWP 29209)):
[0] #0  0x0000000000d67230 in gasnetc_snd_reap.constprop ()
[0] #1  0x0000000000d6f89c in gasnetc_AMPoll ()
[0] #2  0x0000000000d22ea7 in gasneti_AMPoll () at /usr/local/gasnet-1.22.4-openmpi/include/gasnet_help.h:597
[0] #3  0x0000000000d38820 in gasnet_AMPoll () at /usr/local/gasnet-1.22.4-openmpi/include/gasnet_help.h:712
[0] #4  0x0000000000d37d8a in EndpointManager::polling_worker_loop (this=0x1726380) at /home/xyyue/legion/runtime/activemsg.cc:2141
[0] #5  0x0000000000d3f2b0 in Realm::Thread::thread_entry_wrapper<EndpointManager, &EndpointManager::polling_worker_loop> (obj=0x1726380) at /home/xyyue/legion/runtime/realm/threads.inl:127
[0] #6  0x0000000000cf1061 in Realm::KernelThread::pthread_entry (data=0x172bf00) at /home/xyyue/legion/runtime/realm/threads.cc:562
[0] #7  0x00007f9f067c0182 in start_thread (arg=0x7f9ec9937700) at pthread_create.c:312
[0] #8  0x00007f9f057e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 5 (Thread 0x7f9f003e8700 (LWP 29210)):
[0] #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1  0x0000000000d36c5c in IncomingMessageManager::get_messages (this=0x172bf60, sender=@0x7f9f003e7cac: -1, wait=true) at /home/xyyue/legion/runtime/activemsg.cc:715
[0] #2  0x0000000000d36dd1 in IncomingMessageManager::handler_thread_loop (this=0x172bf60) at /home/xyyue/legion/runtime/activemsg.cc:754
[0] #3  0x0000000000d3efae in Realm::Thread::thread_entry_wrapper<IncomingMessageManager, &IncomingMessageManager::handler_thread_loop> (obj=0x172bf60) at /home/xyyue/legion/runtime/realm/threads.inl:127
[0] #4  0x0000000000cf1061 in Realm::KernelThread::pthread_entry (data=0x172c1c0) at /home/xyyue/legion/runtime/realm/threads.cc:562
[0] #5  0x00007f9f067c0182 in start_thread (arg=0x7f9f003e8700) at pthread_create.c:312
[0] #6  0x00007f9f057e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 4 (Thread 0x7f9ec9136700 (LWP 29211)):
[0] #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1  0x0000000000ca7b4c in GASNetCondVar::wait (this=0x172c248) at /home/xyyue/legion/runtime/activemsg.h:169
[0] #2  0x0000000000cc1109 in LegionRuntime::LowLevel::DmaRequestQueue::dequeue_request (this=0x172c220, sleep=true) at /home/xyyue/legion/runtime/lowlevel_dma.cc:329
[0] #3  0x0000000000cc6ccb in LegionRuntime::LowLevel::DmaRequestQueue::worker_thread_loop (this=0x172c220) at /home/xyyue/legion/runtime/lowlevel_dma.cc:3558
[0] #4  0x0000000000cd6a64 in Realm::Thread::thread_entry_wrapper<LegionRuntime::LowLevel::DmaRequestQueue, &LegionRuntime::LowLevel::DmaRequestQueue::worker_thread_loop> (obj=0x172c220) at /home/xyyue/legion/runtime/realm/threads.inl:127
[0] #5  0x0000000000cf1061 in Realm::KernelThread::pthread_entry (data=0x172c4c0) at /home/xyyue/legion/runtime/realm/threads.cc:562
[0] #6  0x00007f9f067c0182 in start_thread (arg=0x7f9ec9136700) at pthread_create.c:312
[0] #7  0x00007f9f057e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
[0]
[0] Thread 3 (Thread 0x7f9f045d9700 (LWP 29212)):
[0] #0  0x00007f9f057abb99 in __libc_waitpid (pid=29233, stat_loc=stat_loc@entry=0x7f9ec83fb150, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
[0] #1  0x00007f9f057312e2 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
[0] #2  0x0000000000d7ca58 in gasneti_bt_gdb ()
[0] #3  0x0000000000d7f8bb in gasneti_print_backtrace ()
[0] #4  0x0000000000dd83f1 in gasneti_defaultSignalHandler ()
[0] #5  <signal handler called>
[0] #6  0x00007f9f05721cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
[0] #7  0x00007f9f057250d8 in __GI_abort () at abort.c:89
[0] #8  0x00007f9f0571ab86 in __assert_fail_base (fmt=0x7f9f0586b830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xdf6f50 "rerun_analysis_requirements.empty()", file=file@entry=0xdf4e90 "/home/xyyue/legion/runtime/legion/legion_tasks.cc", line=line@entry=8818, function=function@entry=0xdfd980 <LegionRuntime::HighLevel::IndexTask::trigger_dependence_analysis()::__PRETTY_FUNCTION__> "virtual void LegionRuntime::HighLevel::IndexTask::trigger_dependence_analysis()") at assert.c:92
[0] #9  0x00007f9f0571ac32 in __GI___assert_fail (assertion=0xdf6f50 "rerun_analysis_requirements.empty()", file=0xdf4e90 "/home/xyyue/legion/runtime/legion/legion_tasks.cc", line=8818, function=0xdfd980 <LegionRuntime::HighLevel::IndexTask::trigger_dependence_analysis()::__PRETTY_FUNCTION__> "virtual void LegionRuntime::HighLevel::IndexTask::trigger_dependence_analysis()") at assert.c:101
[0] #10 0x0000000000a67221 in LegionRuntime::HighLevel::IndexTask::trigger_dependence_analysis (this=0x1774420) at /home/xyyue/legion/runtime/legion/legion_tasks.cc:8818
[0] #11 0x0000000000c0f04c in LegionRuntime::HighLevel::Internal::high_level_runtime_task (args=0x7f9ec861b180, arglen=12, p=...) at /home/xyyue/legion/runtime/legion/runtime.cc:16022
[0] #12 0x0000000000d4d37a in Realm::Task::execute_on_processor (this=0x7f9ec861d380, p=...) at /home/xyyue/legion/runtime/realm/tasks.cc:106
[0] #13 0x0000000000d4fcb8 in Realm::UserThreadTaskScheduler::execute_task (this=0x172cb00, task=0x7f9ec861d380) at /home/xyyue/legion/runtime/realm/tasks.cc:923
[0] #14 0x0000000000d4e343 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x172cb00) at /home/xyyue/legion/runtime/realm/tasks.cc:482
[0] #15 0x0000000000d530e6 in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x172cb00) at /home/xyyue/legion/runtime/realm/threads.inl:127
[0] #16 0x0000000000cf1c81 in Realm::UserThread::uthread_entry () at /home/xyyue/legion/runtime/realm/threads.cc:747
[0] #17 0x00007f9f057348b0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[0] #18 0x0000000000000000 in ?? ()
[0]
[0] Thread 2 (Thread 0x7f9f045d5700 (LWP 29213)):
[0] #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1  0x0000000000ca7b4c in GASNetCondVar::wait (this=0x172d320) at /home/xyyue/legion/runtime/activemsg.h:169
[0] #2  0x0000000000d4d52f in Realm::ThreadedTaskScheduler::WorkCounter::wait_for_work (this=0x172d2e8, old_counter=15) at /home/xyyue/legion/runtime/realm/tasks.cc:230
[0] #3  0x0000000000d4e702 in Realm::ThreadedTaskScheduler::wait_for_work (this=0x172d1e0, old_work_counter=15) at /home/xyyue/legion/runtime/realm/tasks.cc:574
[0] #4  0x0000000000d4fee9 in Realm::UserThreadTaskScheduler::wait_for_work (this=0x172d1e0, old_work_counter=15) at /home/xyyue/legion/runtime/realm/tasks.cc:1017
[0] #5  0x0000000000d4e636 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x172d1e0) at /home/xyyue/legion/runtime/realm/tasks.cc:550
[0] #6  0x0000000000d530e6 in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x172d1e0) at /home/xyyue/legion/runtime/realm/threads.inl:127
[0] #7  0x0000000000cf1c81 in Realm::UserThread::uthread_entry () at /home/xyyue/legion/runtime/realm/threads.cc:747
[0] #8  0x00007f9f057348b0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[0] #9  0x0000000000000000 in ?? ()
[0]
[0] Thread 1 (Thread 0x7f9f06ddd7c0 (LWP 29205)):
[0] #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
[0] #1  0x0000000000ca7b4c in GASNetCondVar::wait (this=0x15c88f8) at /home/xyyue/legion/runtime/activemsg.h:169
[0] #2  0x0000000000ca6753 in Realm::RuntimeImpl::run (this=0x15c8820, task_id=0, style=Realm::Runtime::ONE_TASK_ONLY, args=0x0, arglen=0, background=false) at /home/xyyue/legion/runtime/realm/runtime_impl.cc:1121
[0] #3  0x0000000000ca22c0 in Realm::Runtime::run (this=0x7fffd4d7a280, task_id=0, style=Realm::Runtime::ONE_TASK_ONLY, args=0x0, arglen=0, background=false) at /home/xyyue/legion/runtime/realm/runtime_impl.cc:136
[0] #4  0x0000000000c0cf04 in LegionRuntime::HighLevel::Internal::start (argc=3, argv=0x7fffd4d7a468, background=false) at /home/xyyue/legion/runtime/legion/runtime.cc:15025
[0] #5  0x00000000009fa7ad in LegionRuntime::HighLevel::Runtime::start (argc=3, argv=0x7fffd4d7a468, background=false) at /home/xyyue/legion/runtime/legion/legion.cc:3694
[0] #6  0x00000000009ccbdd in main (argc=3, argv=0x7fffd4d7a468) at main.cc:318
Aborted
@lightsighter
Copy link
Contributor

That's an interesting failure mode. Do you think you can make a minimal test case and attach it to this issue?

@alex-lee678294
Copy link
Author

I have narrowed down the problem, it is in the constructor of the Index Launcher. There I try to add some Region Requirements, and if I delete them, then things work well. I will further check the validity of the Logical Regions. But in the meantime, could you please give me some hint on the possible reasons for this? Thanks a lot!!

@lightsighter
Copy link
Contributor

I don't think this is your fault, but is instead a bug in the runtime. It would be good if you could create a small test case for me to work with so I can better understand the nature of the problem.

@lightsighter
Copy link
Contributor

Actually, I have a guess as to the cause. Here are a few questions that will help narrow down the problem. There should be one region requirement in your index space launch that is causing this issue.

  • What are the privileges on this region requirement (read-only, read-write, reduce)?
  • Is this is a projection region requirement?
  • If yes, what is the projection function you are using?
  • If it is a projection region requirement with a custom projection are the regions it is computing disjoint from each other?
  • If it is a projection region requirement, but you are using the default projection function, is the partition you are projecting over disjoint or aliased?

My hypothesis is that you are getting interfering region requirements for multiple points in your index space task launch and the runtime is not doing a good job reporting this error. However, I would like to confirm this is the issue first. Answers to those questions will help. A small test case that reproduces the problem would be even better.

@alex-lee678294
Copy link
Author

Thanks a lot for the message.

• Actually I have tried adding several region requirements, some of them are read-only and some of them are read-write. Adding any of them will cause the failure.
• They are all projection region requirements and I’m using the default projection function.
• The partition I’m using is disjoint.

I do know that all the points within an index space task launch are required to be non-interfering with each other either because use disjoint regions or non-interfering privileges. I’ll double check, and maybe I got something wrong in my code. At the same time, I’ll try to reduce the problem and provide a small test case.

Thanks again!!

@lightsighter
Copy link
Contributor

Hmm interesting. Yeah, a small test case would be very useful.

@alex-lee678294
Copy link
Author

Just an update. I have figured out that the reason for some of the failure is that I forgot to unmap the physical region before I use the aliased logical region in the subtasks. But one region requirement is still causing failure, I'll continue trying to figure it out...

@lightsighter
Copy link
Contributor

Ok, even if that is the case, the runtime should automatically be unmapping and then remapping the conflicting region around your index space task launch (it does this implicitly to avoid deadlock, but it will incur a performance cost). Play around with that to see if it is impacting the manifestation of the bug.

@alex-lee678294
Copy link
Author

Do you mean whether there are unmapping statements in my code won't affect it's correctness? But the fact is that if I do the unmapping by myself, it works; then I simply comment the single line of unmapping statement, and the error occurs...

@lightsighter
Copy link
Contributor

Yes, the unmapping statements should only impact the performance of your code. If it changes the correctness, then that is a runtime bug. If that is part of the problem, make sure to include that in the small example you are working on generating.

@lightsighter lightsighter self-assigned this Nov 9, 2015
@lightsighter
Copy link
Contributor

This issue has been resolved as result of poor warning messages being issued for interfering region requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants