Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realm Returns Invalid Instance ID Non-Deterministically Under Allocation Failure #419

Closed
lightsighter opened this issue Aug 13, 2018 · 2 comments
Assignees
Labels

Comments

@lightsighter
Copy link
Contributor

Sometimes when an instance allocation fails, Realm will non-deterministically return an instance ID in profiling response that is not a valid instance ID. This is problematic because the client (in this case Legion) still has to call 'destroy' on the instance to reclaim its resources. When this happens, Realm hits the following assertion:

#3 0x00007ffff0f9dc82 in __GI___assert_fail (assertion=0x45e51db "id.is_instance()",
file=0x45e4ff0 "/gpfs/fs1/mbauer/legion/runtime/realm/mem_impl.cc", line=292,
function=0x45e5c40 Realm::MemoryImpl::get_instance(Realm::RegionInstance)::__PRETTY_FUNCTION__ "Realm::RegionInstanceImpl* Realm::MemoryImpl::get_instance(Realm::RegionInstance)") at assert.c:101
#4 0x0000000003baded9 in Realm::MemoryImpl::get_instance (this=0x12776a20, i=...)
at /gpfs/fs1/mbauer/legion/runtime/realm/mem_impl.cc:292
#5 0x0000000003bae912 in Realm::MemoryImpl::release_instance_storage (this=0x12776a20, i=..., precondition=...)
at /gpfs/fs1/mbauer/legion/runtime/realm/mem_impl.cc:457
#6 0x0000000003bbebc3 in Realm::RegionInstance::destroy (this=0x7fc5a8402e18, wait_on=...)
at /gpfs/fs1/mbauer/legion/runtime/realm/inst_impl.cc:251
#7 0x0000000002fbf400 in Legion::Internal::InstanceBuilder::handle_profiling_response (this=0x7fc5a8402d20,
response=...) at /gpfs/fs1/mbauer/legion/runtime/legion/legion_instances.cc:2205
#8 0x00000000033fd64a in Legion::Internal::Runtime::profiling_runtime_task (args=0x7fc5a8007a50, arglen=32,
userdata=0x6827ed0, userlen=8, p=...) at /gpfs/fs1/mbauer/legion/runtime/legion/runtime.cc:22554
#9 0x0000000003ba6e26 in Realm::LocalTaskProcessor::execute_task (this=0x1277d130, func_id=5, task_args=...)
at /gpfs/fs1/mbauer/legion/runtime/realm/proc_impl.cc:945
#10 0x0000000003614e1b in Realm::Task::execute_on_processor (this=0x7fc5a8606580, p=...)
at /gpfs/fs1/mbauer/legion/runtime/realm/tasks.cc:175
#11 0x0000000003617f94 in Realm::UserThreadTaskScheduler::execute_task (this=0x1277d4c0, task=0x7fc5a8606580)
at /gpfs/fs1/mbauer/legion/runtime/realm/tasks.cc:1084
#12 0x00000000036163d7 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x1277d4c0)
at /gpfs/fs1/mbauer/legion/runtime/realm/tasks.cc:593
#13 0x000000000361c4ca in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x1277d4c0) at /gpfs/fs1/mbauer/legion/runtime/realm/threads.inl:131
#14 0x00000000035f6ee7 in Realm::UserThread::uthread_entry () at /gpfs/fs1/mbauer/legion/runtime/realm/threads.cc:981

@streichler
Copy link
Contributor

Commit 48b3298 fixes at least one likely cause of this. @lightsighter, have you see any problems with this since?

@lightsighter
Copy link
Contributor Author

No, I think we found the cause and it is safe to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants