-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locally mapped but remote execution for non-leaf task bug #79
Comments
The problem here is that when we locally map a task and then send it remotely, we're not properly sending back the 'mapped' message from the remote node to the owner node when the task is not a leaf task. This applies to both individual tasks as well as slice tasks with any number of point tasks. What needs to happen is that for these cases, after the task is done running and all of its children have mapped, then we need to send back the proper message to the copy of the task on the owner node saying that the task has mapped (for slice tasks, the condition is that all the point tasks have finished running and all of their children have been mapped). I would recommend figuring this out for individual tasks first as their state machines are simpler, and then doing it for slice tasks. Assigning this back to Wonchan to see if he can figure it out. If not I will get around to it sometime in the next week. |
I might have some time to work on this tomorrow. Do you want to commit what you have and I'll fill in the rest? If you want to keep working on it that is fine too. |
I pushed a change that I think might fix this issue. Please review commit 0251ee0. |
Pushed another commit 5e1d50f. I'm quite sure this will fix the issue. |
This issue is resolved as of commit 538b3d3 |
Running the C++ miniaero (non-spmd version) with a locally mapping mapper gives me the following assertion failure:
mini-Aero.exe: /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9620: void LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int): Assertion `complete_points <= total_points' failed.
Here is the stacktrace:
(gdb) bt
#0 0x00007fb6f57f39bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fb6f57f3854 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x0000000000cbf3f8 in Realm::realm_freeze (signal=6) at /home/wclee/Workspace/legion/runtime/realm/runtime_impl.cc:75
#3
#4 0x00007fb6f5768bb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5 0x00007fb6f576bfc8 in __GI_abort () at abort.c:89
#6 0x00007fb6f5761a76 in __assert_fail_base (fmt=0x7fb6f58b3370 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xe18998 "complete_points <= total_points", file=file@entry=0xe16200 "/home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc", line=line@entry=9620,
#7 0x00007fb6f5761b22 in __GI___assert_fail (assertion=0xe18998 "complete_points <= total_points", file=0xe16200 "/home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc", line=9620,
#8 0x0000000000a834d8 in LegionRuntime::HighLevel::IndexTask::return_slice_complete (this=0x334aba0, points=1) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9620
#9 0x0000000000a83b23 in LegionRuntime::HighLevel::IndexTask::unpack_slice_complete (this=0x334aba0, derez=...) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9738
#10 0x0000000000a83c5a in LegionRuntime::HighLevel::IndexTask::process_slice_complete (derez=...) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9767
#11 0x0000000000c22cca in LegionRuntime::HighLevel::Runtime::handle_slice_remote_complete (this=0x331fd80, derez=...) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:11672
#12 0x0000000000c08cdb in LegionRuntime::HighLevel::VirtualChannel::handle_messages (this=0x7fb0b11108e0, num_messages=6, runtime=0x331fd80, remote_address_space=1, args=0x7fb0a054a604 "\240\253\064\003", arglen=160) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3352
#13 0x0000000000c08871 in LegionRuntime::HighLevel::VirtualChannel::process_message (this=0x7fb0b11108e0, args=0x7fb0a054a1cc, arglen=1232, runtime=0x331fd80, remote_address_space=1) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3145
#14 0x0000000000c09789 in LegionRuntime::HighLevel::MessageManager::receive_message (this=0x7fb0bb3fad00, args=0x7fb0a054a1c8, arglen=1240) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3771
#15 0x0000000000c23999 in LegionRuntime::HighLevel::Runtime::process_message_task (this=0x331fd80, args=0x7fb0a054a1c4, arglen=1244) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:12179
#16 0x0000000000c2d39c in LegionRuntime::HighLevel::Runtime::high_level_runtime_task (args=0x7fb0a054a1c0, arglen=1248, p=...) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:15795
#17 0x0000000000d207e4 in Realm::Task::execute_on_processor (this=0x7fb0a05d25c0, p=...) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:80
#18 0x0000000000d22a66 in Realm::UserThreadTaskScheduler::execute_task (this=0x3306c20, task=0x7fb0a05d25c0) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:887
#19 0x0000000000d21527 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x3306c20) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:448
#20 0x0000000000d2625e in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x3306c20) at /home/wclee/Workspace/legion/runtime/realm/threads.inl:127
#21 0x0000000000d0e335 in Realm::UserThread::uthread_entry () at /home/wclee/Workspace/legion/runtime/realm/threads.cc:740
#22 0x00007fb6f577b7a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x0000000000000000 in ?? ()
The text was updated successfully, but these errors were encountered: