-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Labels
Description
Example dmesg:
[ 484.355618] INFO: task app:8986 blocked for more than 120 seconds.
[ 484.355643] Tainted: G OE 5.15.0-124-generic #134-Ubuntu
[ 484.355665] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 484.355688] task:app state:D stack: 0 pid: 8986 ppid: 8985 flags:0x00004002
[ 484.355691] Call Trace:
[ 484.355692] <TASK>
[ 484.355694] __schedule+0x24e/0x590
[ 484.355698] schedule+0x69/0x110
[ 484.355699] schedule_timeout+0x105/0x140
[ 484.355701] ? __queue_delayed_work+0x5c/0xa0
[ 484.355703] ? queue_delayed_work_on+0x3d/0x60
[ 484.355705] __wait_for_common+0xab/0x150
[ 484.355706] ? usleep_range_state+0x90/0x90
[ 484.355708] wait_for_completion+0x24/0x30
[ 484.355709] __synchronize_srcu.part.0+0x7f/0xf0
[ 484.355712] ? __bpf_trace_rcu_stall_warning+0x10/0x10
[ 484.355714] synchronize_srcu+0xfb/0x120
[ 484.355716] mmu_notifier_unregister+0xbc/0xf0
[ 484.355719] sgx_release+0x94/0x140
[ 484.355722] __fput+0x9c/0x280
[ 484.355723] ____fput+0xe/0x20
[ 484.355725] task_work_run+0x6a/0xb0
[ 484.355726] exit_to_user_mode_loop+0x157/0x160
[ 484.355729] exit_to_user_mode_prepare+0xa0/0xb0
[ 484.355731] syscall_exit_to_user_mode+0x27/0x50
[ 484.355733] ? x64_sys_call+0x1e07/0x1fa0
[ 484.355736] do_syscall_64+0x63/0xb0
[ 484.355738] ? exit_to_user_mode_prepare+0x37/0xb0
[ 484.355740] ? syscall_exit_to_user_mode+0x2c/0x50
[ 484.355741] ? x64_sys_call+0x1de6/0x1fa0
[ 484.355743] ? do_syscall_64+0x63/0xb0
[ 484.355744] ? __x64_sys_openat+0x55/0x90
[ 484.355746] ? exit_to_user_mode_prepare+0x37/0xb0
[ 484.355748] ? syscall_exit_to_user_mode+0x2c/0x50
[ 484.355750] ? x64_sys_call+0x1a55/0x1fa0
[ 484.355752] ? do_syscall_64+0x63/0xb0
[ 484.355753] ? x64_sys_call+0x1e3e/0x1fa0
[ 484.355755] ? do_syscall_64+0x63/0xb0
[ 484.355755] ? clear_bhb_loop+0x45/0xa0
[ 484.355758] ? clear_bhb_loop+0x45/0xa0
[ 484.355760] ? clear_bhb_loop+0x45/0xa0
[ 484.355762] ? clear_bhb_loop+0x45/0xa0
[ 484.355764] ? clear_bhb_loop+0x45/0xa0
[ 484.355766] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[ 484.355768] RIP: 0033:0x7fa070ccba7b
[ 484.355770] RSP: 002b:00007ffd1025fe18 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
[ 484.355772] RAX: 0000000000000000 RBX: 00005580d4c8e8c0 RCX: 00007fa070ccba7b
[ 484.355773] RDX: 0000000000000000 RSI: 0000000000200000 RDI: 00007fa070400000
[ 484.355774] RBP: 00007ffd1025fe94 R08: 00005580d4c8e8c0 R09: 0000000000000000
[ 484.355775] R10: 0000000000000000 R11: 0000000000000206 R12: 00007fa070bac1c0
[ 484.355775] R13: 00007fa070bac188 R14: 00007fa070bac188 R15: 00007fa070bac200
[ 484.355777] </TASK>
app is blocked in D state and reboot is the only remedy
After some digging, it seems this is caused by an explicit call to sgx_destroy_enclave
before process exit.
From the call trace above, the problem seems to be caused by:
- https://elixir.bootlin.com/linux/v6.10.10/source/kernel/rcu/srcutree.c#L1425
- https://elixir.bootlin.com/linux/v6.10.10/source/mm/mmu_notifier.c#L829
- https://elixir.bootlin.com/linux/v6.10.10/source/arch/x86/kernel/cpu/sgx/driver.c#L72
Possible hypothesis:
- kernel creates enclave memory vmas (virtual memory areas)
- sgx-step/app creates pte/pmd pointers to the physical vmas of the enclave (via dev/mem)
- upon destroying enclave vmas, kernel checks reference counters to see if anyone still holds ptrs
- waits indef for sgx-step/app to release it's ptrs to the physical memory
heavyimage