Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locally mapped but remote execution for non-leaf task bug #79

Closed
magnatelee opened this issue Oct 21, 2015 · 5 comments
Closed

Locally mapped but remote execution for non-leaf task bug #79

magnatelee opened this issue Oct 21, 2015 · 5 comments
Assignees
Labels

Comments

@magnatelee
Copy link
Contributor

Running the C++ miniaero (non-spmd version) with a locally mapping mapper gives me the following assertion failure:

mini-Aero.exe: /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9620: void LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int): Assertion `complete_points <= total_points' failed.

Here is the stacktrace:

(gdb) bt
#0 0x00007fb6f57f39bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fb6f57f3854 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x0000000000cbf3f8 in Realm::realm_freeze (signal=6) at /home/wclee/Workspace/legion/runtime/realm/runtime_impl.cc:75
#3
#4 0x00007fb6f5768bb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5 0x00007fb6f576bfc8 in __GI_abort () at abort.c:89
#6 0x00007fb6f5761a76 in __assert_fail_base (fmt=0x7fb6f58b3370 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xe18998 "complete_points <= total_points", file=file@entry=0xe16200 "/home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc", line=line@entry=9620,

function=function@entry=0xe1f480 <LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int)::__PRETTY_FUNCTION__> "void LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int)") at assert.c:92

#7 0x00007fb6f5761b22 in __GI___assert_fail (assertion=0xe18998 "complete_points <= total_points", file=0xe16200 "/home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc", line=9620,

function=0xe1f480 <LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int)::__PRETTY_FUNCTION__> "void LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int)") at assert.c:101

#8 0x0000000000a834d8 in LegionRuntime::HighLevel::IndexTask::return_slice_complete (this=0x334aba0, points=1) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9620
#9 0x0000000000a83b23 in LegionRuntime::HighLevel::IndexTask::unpack_slice_complete (this=0x334aba0, derez=...) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9738
#10 0x0000000000a83c5a in LegionRuntime::HighLevel::IndexTask::process_slice_complete (derez=...) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9767
#11 0x0000000000c22cca in LegionRuntime::HighLevel::Runtime::handle_slice_remote_complete (this=0x331fd80, derez=...) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:11672
#12 0x0000000000c08cdb in LegionRuntime::HighLevel::VirtualChannel::handle_messages (this=0x7fb0b11108e0, num_messages=6, runtime=0x331fd80, remote_address_space=1, args=0x7fb0a054a604 "\240\253\064\003", arglen=160) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3352
#13 0x0000000000c08871 in LegionRuntime::HighLevel::VirtualChannel::process_message (this=0x7fb0b11108e0, args=0x7fb0a054a1cc, arglen=1232, runtime=0x331fd80, remote_address_space=1) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3145
#14 0x0000000000c09789 in LegionRuntime::HighLevel::MessageManager::receive_message (this=0x7fb0bb3fad00, args=0x7fb0a054a1c8, arglen=1240) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3771
#15 0x0000000000c23999 in LegionRuntime::HighLevel::Runtime::process_message_task (this=0x331fd80, args=0x7fb0a054a1c4, arglen=1244) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:12179
#16 0x0000000000c2d39c in LegionRuntime::HighLevel::Runtime::high_level_runtime_task (args=0x7fb0a054a1c0, arglen=1248, p=...) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:15795
#17 0x0000000000d207e4 in Realm::Task::execute_on_processor (this=0x7fb0a05d25c0, p=...) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:80
#18 0x0000000000d22a66 in Realm::UserThreadTaskScheduler::execute_task (this=0x3306c20, task=0x7fb0a05d25c0) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:887
#19 0x0000000000d21527 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x3306c20) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:448
#20 0x0000000000d2625e in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x3306c20) at /home/wclee/Workspace/legion/runtime/realm/threads.inl:127
#21 0x0000000000d0e335 in Realm::UserThread::uthread_entry () at /home/wclee/Workspace/legion/runtime/realm/threads.cc:740
#22 0x00007fb6f577b7a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x0000000000000000 in ?? ()

@lightsighter
Copy link
Contributor

The problem here is that when we locally map a task and then send it remotely, we're not properly sending back the 'mapped' message from the remote node to the owner node when the task is not a leaf task. This applies to both individual tasks as well as slice tasks with any number of point tasks. What needs to happen is that for these cases, after the task is done running and all of its children have mapped, then we need to send back the proper message to the copy of the task on the owner node saying that the task has mapped (for slice tasks, the condition is that all the point tasks have finished running and all of their children have been mapped). I would recommend figuring this out for individual tasks first as their state machines are simpler, and then doing it for slice tasks.

Assigning this back to Wonchan to see if he can figure it out. If not I will get around to it sometime in the next week.

@lightsighter
Copy link
Contributor

I might have some time to work on this tomorrow. Do you want to commit what you have and I'll fill in the rest? If you want to keep working on it that is fine too.

@magnatelee
Copy link
Contributor Author

I pushed a change that I think might fix this issue. Please review commit 0251ee0.

@magnatelee
Copy link
Contributor Author

Pushed another commit 5e1d50f. I'm quite sure this will fix the issue.

@magnatelee magnatelee assigned lightsighter and unassigned magnatelee Nov 2, 2015
@lightsighter
Copy link
Contributor

This issue is resolved as of commit 538b3d3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants