Skip to content

Provided examples work but there is kernel error at the end of execution #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
keryell opened this issue Feb 16, 2024 · 5 comments
Closed

Comments

@keryell
Copy link
Contributor

keryell commented Feb 16, 2024

Hello!
All the 3 provided examples seem to work like:

./example_build/example_noop_test /lib/firmware/amdnpu/1502/validate.xclbin
...$ ./example_build/example_noop_test /lib/firmware/amdnpu/1502/validate.xclbin 
Host test code start...
Host test code is creating device object...
Host test code is loading xclbin object...
Host test code is creating kernel object...
Host test code kernel name: DPU_PDI_0
Host code is registering xclbin to the device...
Host code is creating hw_context...
Host test code is creating kernel object...
Host test code allocate buffer objects...
Host test code sync buffer objects to device...
Host test code iterations (~10 seconds): 70000
Host test microseconds: 6962790
Host test average latency: 99 us/iter
TEST PASSED!

but when I look at the dmesg or /var/log/kern.log there is a scary:

2024-02-15T17:51:15.668654-08:00 rk-xsj kernel: [ 2909.731818] ------------[ cut here ]------------
2024-02-15T17:51:15.668663-08:00 rk-xsj kernel: [ 2909.731821] WARNING: CPU: 9 PID: 42463 at drivers/iommu/io-pgfault.c:249 iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668664-08:00 rk-xsj kernel: [ 2909.731827] Modules linked in: amdxdna(OE) drm_shmem_helper xocl(OE) xclmgmt(OE) hid_logitech_hidpp hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_ump rfcomm snd_seq_dummy snd_hrtimer xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc ipmi_devintf ipmi_msghandler nvme_fabrics vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ccm overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 intel_rapl_msr joydev intel_rapl_common snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_sof_amd_acp63 snd_sof_amd_vangogh ledtrig_audio snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof snd_hda_intel snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_core edac_mce_amd snd_hda_codec mt7921e mt7921_common btusb snd_hda_core snd_compress uvcvideo btrtl mt792x_lib ac97_bus snd_hwdep
2024-02-15T17:51:15.668665-08:00 rk-xsj kernel: [ 2909.731881]  kvm_amd btintel snd_pcm_dmaengine videobuf2_vmalloc mt76_connac_lib btbcm uvc mt76 snd_seq_midi snd_pci_ps btmtk videobuf2_memops snd_seq_midi_event snd_rpl_pci_acp6x videobuf2_v4l2 snd_rawmidi snd_acp_pci mac80211 kvm bluetooth snd_acp_legacy_common videodev snd_pci_acp6x snd_seq snd_pcm irqbypass videobuf2_common ecdh_generic snd_seq_device crct10dif_pclmul crc32_pclmul hid_multitouch ecc mc snd_pci_acp5x snd_timer polyval_clmulni polyval_generic cfg80211 snd_rn_pci_acp3x ghash_clmulni_intel snd_acp_config sha256_ssse3 ucsi_acpi hp_wmi snd sha1_ssse3 r8169 typec_ucsi snd_soc_acpi sparse_keymap rapl platform_profile wmi_bmof thunderbolt k10temp libarc4 realtek soundcore ccp snd_pci_acp3x i2c_piix4 typec nvidia_uvm(POE) i2c_hid_acpi wireless_hotkey i2c_hid amd_pmc msr parport_pc ppdev nfsd lp parport auth_rpcgss nfs_acl lockd grace efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) amdgpu amdxcp drm_exec gpu_sched
2024-02-15T17:51:15.668665-08:00 rk-xsj kernel: [ 2909.731942]  drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core drm_kms_helper input_leds nvme video drm serio_raw xhci_pci nvme_core xhci_pci_renesas i2c_algo_bit wmi mac_hid aesni_intel crypto_simd cryptd
2024-02-15T17:51:15.668666-08:00 rk-xsj kernel: [ 2909.731958] CPU: 9 PID: 42463 Comm: example_noop_te Tainted: P        W  OE      6.7.4+iommu-sva-v4+ #1
2024-02-15T17:51:15.668666-08:00 rk-xsj kernel: [ 2909.731960] Hardware name: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.03.00 09/11/2023
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731961] RIP: 0010:iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731964] Code: 48 8b 87 d0 02 00 00 48 8b 40 20 48 85 c0 74 1a 55 48 8b 40 40 48 8b 38 48 89 e5 e8 8b 79 61 ff 31 c0 5d 31 ff e9 6c 06 80 00 <0f> 0b b8 ed ff ff ff 31 ff e9 5e 06 80 00 0f 1f 00 90 90 90 90 90
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731965] RSP: 0018:ffffb19f0faffcc8 EFLAGS: 00010246
2024-02-15T17:51:15.668667-08:00 rk-xsj kernel: [ 2909.731967] RAX: 0000000000000000 RBX: ffffa05d41aeb0c0 RCX: 0000000000000000
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731968] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa05d41aeb0c0
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731969] RBP: ffffb19f0faffd00 R08: 0000000000000000 R09: 0000000000000000
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731970] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731970] R13: ffffa05d41895d80 R14: ffffa0617fc70978 R15: ffffa0617fc70810
2024-02-15T17:51:15.668668-08:00 rk-xsj kernel: [ 2909.731971] FS:  00007fb2bd043c00(0000) GS:ffffa06c75a40000(0000) knlGS:0000000000000000
2024-02-15T17:51:15.668669-08:00 rk-xsj kernel: [ 2909.731973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-02-15T17:51:15.668669-08:00 rk-xsj kernel: [ 2909.731974] CR2: 00007fb2bc945400 CR3: 000000068f1ac000 CR4: 0000000000750ef0
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731975] PKRU: 55555554
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731976] Call Trace:
2024-02-15T17:51:15.668670-08:00 rk-xsj kernel: [ 2909.731977]  <TASK>
2024-02-15T17:51:15.668671-08:00 rk-xsj kernel: [ 2909.731981]  ? show_regs+0x6d/0x80
2024-02-15T17:51:15.668671-08:00 rk-xsj kernel: [ 2909.731984]  ? __warn+0x89/0x160
2024-02-15T17:51:15.668676-08:00 rk-xsj kernel: [ 2909.731987]  ? iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668676-08:00 rk-xsj kernel: [ 2909.731989]  ? report_bug+0x17e/0x1b0
2024-02-15T17:51:15.668677-08:00 rk-xsj kernel: [ 2909.731993]  ? handle_bug+0x51/0xa0
2024-02-15T17:51:15.668677-08:00 rk-xsj kernel: [ 2909.731996]  ? exc_invalid_op+0x18/0x80
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.731998]  ? asm_exc_invalid_op+0x1b/0x20
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732003]  ? iopf_queue_flush_dev+0x2f/0x40
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732005]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732006]  ? amd_iommu_remove_dev_pasid+0x7d/0x160
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732010]  iommu_detach_device_pasid+0x5a/0xa0
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732013]  iommu_sva_unbind_device+0x3f/0xa0
2024-02-15T17:51:15.668678-08:00 rk-xsj kernel: [ 2909.732017]  amdxdna_drm_close+0xa5/0x130 [amdxdna]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732024]  drm_file_free+0x1e6/0x260 [drm]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732045]  drm_release+0xc7/0x150 [drm]
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732059]  __fput+0x9e/0x2e0
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732063]  __fput_sync+0x1c/0x30
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732065]  __x64_sys_close+0x3e/0x90
2024-02-15T17:51:15.668679-08:00 rk-xsj kernel: [ 2909.732068]  do_syscall_64+0x5d/0xf0
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732070]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732072]  ? ksys_write+0x73/0x100
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732073]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732075]  ? exit_to_user_mode_prepare+0x39/0x190
2024-02-15T17:51:15.668680-08:00 rk-xsj kernel: [ 2909.732078]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732080]  ? syscall_exit_to_user_mode+0x37/0x60
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732082]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732083]  ? do_syscall_64+0x6c/0xf0
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732085]  ? do_syscall_64+0x6c/0xf0
2024-02-15T17:51:15.668681-08:00 rk-xsj kernel: [ 2909.732087]  ? exc_page_fault+0x94/0x1b0
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732089]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732091] RIP: 0033:0x7fb2bc7157c4
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732093] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 85 0d 0f 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 13
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732094] RSP: 002b:00007ffd629c61e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
2024-02-15T17:51:15.668682-08:00 rk-xsj kernel: [ 2909.732096] RAX: ffffffffffffffda RBX: 000055808393b6f0 RCX: 00007fb2bc7157c4
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732097] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000003
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732098] RBP: 000055808393b788 R08: 0000000000000000 R09: 0000000000000000
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732098] R10: 0000558083999a60 R11: 0000000000000202 R12: 00007fb2bc9ff100
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732099] R13: 000055808393b4f0 R14: 0000000100000001 R15: 00005580839336d0
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732102]  </TASK>
2024-02-15T17:51:15.668683-08:00 rk-xsj kernel: [ 2909.732103] ---[ end trace 0000000000000000 ]---

Any idea? Is it normal?
At least it does not crash my work laptop. ;-)

@keryell keryell changed the title Provides example works but there is kernel error at the end of execution Provided examples work but there is kernel error at the end of execution Feb 16, 2024
@keryell
Copy link
Contributor Author

keryell commented Feb 16, 2024

This is with Ubuntu 23.10 and kernel branch iommu_sva_v4_v6.7-rc8 rebased on top of latest v6.7.4 as explained in #3 (comment)

@maxzhen
Copy link
Collaborator

maxzhen commented Feb 16, 2024

This is a known issue with the IOMMU implementation. The warning can be safely ignored. It should have been fixed by now in next 6.8 branch.
We were told that the fix is as simple as:

diff --git a/drivers/iommu/amd/pasid.c b/drivers/iommu/amd/pasid.c index d1b0e129506f..ad39cc197ac2 100644
--- a/drivers/iommu/amd/pasid.c
+++ b/drivers/iommu/amd/pasid.c
@@ -225,7 +225,8 @@ void amd_iommu_remove_dev_pasid(struct device *dev, ioasid_t
pasid)
 	sva_pdom = to_pdomain(domain);

 	/* Ensure that all queued faults have been processed */
-	iopf_queue_flush_dev(dev);
+	if (dev_data->pri_enabled)
+		iopf_queue_flush_dev(dev);

 	spin_lock_irqsave(&sva_pdom->lock, flags);

@maxzhen maxzhen closed this as completed Feb 16, 2024
@keryell
Copy link
Contributor Author

keryell commented Feb 16, 2024

But this "known" issue was not mentioned in this project, right?
Perhaps I can add this to the FAQ?

@maxzhen
Copy link
Collaborator

maxzhen commented Feb 16, 2024

We're about to update to 6.8 kernel and it should be fixed. Since this error is harmless, don't bother doing anything for now.

keryell added a commit to keryell/linux that referenced this issue Feb 16, 2024
Apply suggestion from Max Zhen from
amd/xdna-driver#16 (comment)
to avoid:
------------[ cut here ]------------
WARNING: CPU: 9 PID: 42463 at drivers/iommu/io-pgfault.c:249 iopf_queue_flush_dev+0x2f/0x40
Modules linked in: amdxdna(OE) drm_shmem_helper xocl(OE) xclmgmt(OE) hid_logitech_hidpp hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_ump rfcomm snd_seq_dummy snd_hrtimer xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc ipmi_devintf ipmi_msghandler nvme_fabrics vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ccm overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 intel_rapl_msr joydev intel_rapl_common snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_sof_amd_acp63 snd_sof_amd_vangogh ledtrig_audio snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof snd_hda_intel snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_core edac_mce_amd snd_hda_codec mt7921e mt7921_common btusb snd_hda_core snd_compress uvcvideo btrtl mt792x_lib ac97_bus snd_hwdep
 kvm_amd btintel snd_pcm_dmaengine videobuf2_vmalloc mt76_connac_lib btbcm uvc mt76 snd_seq_midi snd_pci_ps btmtk videobuf2_memops snd_seq_midi_event snd_rpl_pci_acp6x videobuf2_v4l2 snd_rawmidi snd_acp_pci mac80211 kvm bluetooth snd_acp_legacy_common videodev snd_pci_acp6x snd_seq snd_pcm irqbypass videobuf2_common ecdh_generic snd_seq_device crct10dif_pclmul crc32_pclmul hid_multitouch ecc mc snd_pci_acp5x snd_timer polyval_clmulni polyval_generic cfg80211 snd_rn_pci_acp3x ghash_clmulni_intel snd_acp_config sha256_ssse3 ucsi_acpi hp_wmi snd sha1_ssse3 r8169 typec_ucsi snd_soc_acpi sparse_keymap rapl platform_profile wmi_bmof thunderbolt k10temp libarc4 realtek soundcore ccp snd_pci_acp3x i2c_piix4 typec nvidia_uvm(POE) i2c_hid_acpi wireless_hotkey i2c_hid amd_pmc msr parport_pc ppdev nfsd lp parport auth_rpcgss nfs_acl lockd grace efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) amdgpu amdxcp drm_exec gpu_sched
 drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core drm_kms_helper input_leds nvme video drm serio_raw xhci_pci nvme_core xhci_pci_renesas i2c_algo_bit wmi mac_hid aesni_intel crypto_simd cryptd
CPU: 9 PID: 42463 Comm: example_noop_te Tainted: P        W  OE      6.7.4+iommu-sva-v4+ #1
Hardware name: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.03.00 09/11/2023
RIP: 0010:iopf_queue_flush_dev+0x2f/0x40
Code: 48 8b 87 d0 02 00 00 48 8b 40 20 48 85 c0 74 1a 55 48 8b 40 40 48 8b 38 48 89 e5 e8 8b 79 61 ff 31 c0 5d 31 ff e9 6c 06 80 00 <0f> 0b b8 ed ff ff ff 31 ff e9 5e 06 80 00 0f 1f 00 90 90 90 90 90
RSP: 0018:ffffb19f0faffcc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa05d41aeb0c0 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa05d41aeb0c0
 RBP: ffffb19f0faffd00 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
 R13: ffffa05d41895d80 R14: ffffa0617fc70978 R15: ffffa0617fc70810
 FS:  00007fb2bd043c00(0000) GS:ffffa06c75a40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fb2bc945400 CR3: 000000068f1ac000 CR4: 0000000000750ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? show_regs+0x6d/0x80
  ? __warn+0x89/0x160
  ? iopf_queue_flush_dev+0x2f/0x40
  ? report_bug+0x17e/0x1b0
  ? handle_bug+0x51/0xa0
  ? exc_invalid_op+0x18/0x80
  ? asm_exc_invalid_op+0x1b/0x20
  ? iopf_queue_flush_dev+0x2f/0x40
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? amd_iommu_remove_dev_pasid+0x7d/0x160
  iommu_detach_device_pasid+0x5a/0xa0
  iommu_sva_unbind_device+0x3f/0xa0
  amdxdna_drm_close+0xa5/0x130 [amdxdna]
  drm_file_free+0x1e6/0x260 [drm]
  drm_release+0xc7/0x150 [drm]
  __fput+0x9e/0x2e0
  __fput_sync+0x1c/0x30
  __x64_sys_close+0x3e/0x90
  do_syscall_64+0x5d/0xf0
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? ksys_write+0x73/0x100
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? exit_to_user_mode_prepare+0x39/0x190
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? syscall_exit_to_user_mode+0x37/0x60
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? do_syscall_64+0x6c/0xf0
  ? do_syscall_64+0x6c/0xf0
  ? exc_page_fault+0x94/0x1b0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 RIP: 0033:0x7fb2bc7157c4
 Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 85 0d 0f 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 13
 RSP: 002b:00007ffd629c61e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
 RAX: ffffffffffffffda RBX: 000055808393b6f0 RCX: 00007fb2bc7157c4
 RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000003
 RBP: 000055808393b788 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000558083999a60 R11: 0000000000000202 R12: 00007fb2bc9ff100
 R13: 000055808393b4f0 R14: 0000000100000001 R15: 00005580839336d0
  </TASK>
 ---[ end trace 0000000000000000 ]---
@keryell
Copy link
Contributor Author

keryell commented Feb 16, 2024

We're about to update to 6.8 kernel and it should be fixed. Since this error is harmless, don't bother doing anything for now.

Too late. :-) I have applied your suggested fix and it solves the issue.
Thanks.

keryell added a commit to keryell/linux that referenced this issue Feb 20, 2024
Apply suggestion from Max Zhen from
amd/xdna-driver#16 (comment)
to avoid:
------------[ cut here ]------------
WARNING: CPU: 9 PID: 42463 at drivers/iommu/io-pgfault.c:249 iopf_queue_flush_dev+0x2f/0x40
Modules linked in: amdxdna(OE) drm_shmem_helper xocl(OE) xclmgmt(OE) hid_logitech_hidpp hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_ump rfcomm snd_seq_dummy snd_hrtimer xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc ipmi_devintf ipmi_msghandler nvme_fabrics vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ccm overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 intel_rapl_msr joydev intel_rapl_common snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_sof_amd_acp63 snd_sof_amd_vangogh ledtrig_audio snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof snd_hda_intel snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_core edac_mce_amd snd_hda_codec mt7921e mt7921_common btusb snd_hda_core snd_compress uvcvideo btrtl mt792x_lib ac97_bus snd_hwdep
 kvm_amd btintel snd_pcm_dmaengine videobuf2_vmalloc mt76_connac_lib btbcm uvc mt76 snd_seq_midi snd_pci_ps btmtk videobuf2_memops snd_seq_midi_event snd_rpl_pci_acp6x videobuf2_v4l2 snd_rawmidi snd_acp_pci mac80211 kvm bluetooth snd_acp_legacy_common videodev snd_pci_acp6x snd_seq snd_pcm irqbypass videobuf2_common ecdh_generic snd_seq_device crct10dif_pclmul crc32_pclmul hid_multitouch ecc mc snd_pci_acp5x snd_timer polyval_clmulni polyval_generic cfg80211 snd_rn_pci_acp3x ghash_clmulni_intel snd_acp_config sha256_ssse3 ucsi_acpi hp_wmi snd sha1_ssse3 r8169 typec_ucsi snd_soc_acpi sparse_keymap rapl platform_profile wmi_bmof thunderbolt k10temp libarc4 realtek soundcore ccp snd_pci_acp3x i2c_piix4 typec nvidia_uvm(POE) i2c_hid_acpi wireless_hotkey i2c_hid amd_pmc msr parport_pc ppdev nfsd lp parport auth_rpcgss nfs_acl lockd grace efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) amdgpu amdxcp drm_exec gpu_sched
 drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core drm_kms_helper input_leds nvme video drm serio_raw xhci_pci nvme_core xhci_pci_renesas i2c_algo_bit wmi mac_hid aesni_intel crypto_simd cryptd
CPU: 9 PID: 42463 Comm: example_noop_te Tainted: P        W  OE      6.7.4+iommu-sva-v4+ #1
Hardware name: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.03.00 09/11/2023
RIP: 0010:iopf_queue_flush_dev+0x2f/0x40
Code: 48 8b 87 d0 02 00 00 48 8b 40 20 48 85 c0 74 1a 55 48 8b 40 40 48 8b 38 48 89 e5 e8 8b 79 61 ff 31 c0 5d 31 ff e9 6c 06 80 00 <0f> 0b b8 ed ff ff ff 31 ff e9 5e 06 80 00 0f 1f 00 90 90 90 90 90
RSP: 0018:ffffb19f0faffcc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa05d41aeb0c0 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa05d41aeb0c0
 RBP: ffffb19f0faffd00 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
 R13: ffffa05d41895d80 R14: ffffa0617fc70978 R15: ffffa0617fc70810
 FS:  00007fb2bd043c00(0000) GS:ffffa06c75a40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fb2bc945400 CR3: 000000068f1ac000 CR4: 0000000000750ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? show_regs+0x6d/0x80
  ? __warn+0x89/0x160
  ? iopf_queue_flush_dev+0x2f/0x40
  ? report_bug+0x17e/0x1b0
  ? handle_bug+0x51/0xa0
  ? exc_invalid_op+0x18/0x80
  ? asm_exc_invalid_op+0x1b/0x20
  ? iopf_queue_flush_dev+0x2f/0x40
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? amd_iommu_remove_dev_pasid+0x7d/0x160
  iommu_detach_device_pasid+0x5a/0xa0
  iommu_sva_unbind_device+0x3f/0xa0
  amdxdna_drm_close+0xa5/0x130 [amdxdna]
  drm_file_free+0x1e6/0x260 [drm]
  drm_release+0xc7/0x150 [drm]
  __fput+0x9e/0x2e0
  __fput_sync+0x1c/0x30
  __x64_sys_close+0x3e/0x90
  do_syscall_64+0x5d/0xf0
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? ksys_write+0x73/0x100
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? exit_to_user_mode_prepare+0x39/0x190
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? syscall_exit_to_user_mode+0x37/0x60
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? do_syscall_64+0x6c/0xf0
  ? do_syscall_64+0x6c/0xf0
  ? exc_page_fault+0x94/0x1b0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 RIP: 0033:0x7fb2bc7157c4
 Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 85 0d 0f 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 13
 RSP: 002b:00007ffd629c61e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
 RAX: ffffffffffffffda RBX: 000055808393b6f0 RCX: 00007fb2bc7157c4
 RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000003
 RBP: 000055808393b788 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000558083999a60 R11: 0000000000000202 R12: 00007fb2bc9ff100
 R13: 000055808393b4f0 R14: 0000000100000001 R15: 00005580839336d0
  </TASK>
 ---[ end trace 0000000000000000 ]---
keryell added a commit to keryell/linux that referenced this issue Feb 27, 2024
Apply suggestion from Max Zhen from
amd/xdna-driver#16 (comment)
to avoid:
------------[ cut here ]------------
WARNING: CPU: 9 PID: 42463 at drivers/iommu/io-pgfault.c:249 iopf_queue_flush_dev+0x2f/0x40
Modules linked in: amdxdna(OE) drm_shmem_helper xocl(OE) xclmgmt(OE) hid_logitech_hidpp hid_logitech_dj snd_usb_audio snd_usbmidi_lib snd_ump rfcomm snd_seq_dummy snd_hrtimer xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc ipmi_devintf ipmi_msghandler nvme_fabrics vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ccm overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 intel_rapl_msr joydev intel_rapl_common snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_sof_amd_acp63 snd_sof_amd_vangogh ledtrig_audio snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_hdmi snd_sof snd_hda_intel snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_core edac_mce_amd snd_hda_codec mt7921e mt7921_common btusb snd_hda_core snd_compress uvcvideo btrtl mt792x_lib ac97_bus snd_hwdep
 kvm_amd btintel snd_pcm_dmaengine videobuf2_vmalloc mt76_connac_lib btbcm uvc mt76 snd_seq_midi snd_pci_ps btmtk videobuf2_memops snd_seq_midi_event snd_rpl_pci_acp6x videobuf2_v4l2 snd_rawmidi snd_acp_pci mac80211 kvm bluetooth snd_acp_legacy_common videodev snd_pci_acp6x snd_seq snd_pcm irqbypass videobuf2_common ecdh_generic snd_seq_device crct10dif_pclmul crc32_pclmul hid_multitouch ecc mc snd_pci_acp5x snd_timer polyval_clmulni polyval_generic cfg80211 snd_rn_pci_acp3x ghash_clmulni_intel snd_acp_config sha256_ssse3 ucsi_acpi hp_wmi snd sha1_ssse3 r8169 typec_ucsi snd_soc_acpi sparse_keymap rapl platform_profile wmi_bmof thunderbolt k10temp libarc4 realtek soundcore ccp snd_pci_acp3x i2c_piix4 typec nvidia_uvm(POE) i2c_hid_acpi wireless_hotkey i2c_hid amd_pmc msr parport_pc ppdev nfsd lp parport auth_rpcgss nfs_acl lockd grace efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) amdgpu amdxcp drm_exec gpu_sched
 drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core drm_kms_helper input_leds nvme video drm serio_raw xhci_pci nvme_core xhci_pci_renesas i2c_algo_bit wmi mac_hid aesni_intel crypto_simd cryptd
CPU: 9 PID: 42463 Comm: example_noop_te Tainted: P        W  OE      6.7.4+iommu-sva-v4+ #1
Hardware name: HP HP ZBook Power 15.6 inch G10 A Mobile Workstation PC/8B95, BIOS V85 Ver. 01.03.00 09/11/2023
RIP: 0010:iopf_queue_flush_dev+0x2f/0x40
Code: 48 8b 87 d0 02 00 00 48 8b 40 20 48 85 c0 74 1a 55 48 8b 40 40 48 8b 38 48 89 e5 e8 8b 79 61 ff 31 c0 5d 31 ff e9 6c 06 80 00 <0f> 0b b8 ed ff ff ff 31 ff e9 5e 06 80 00 0f 1f 00 90 90 90 90 90
RSP: 0018:ffffb19f0faffcc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa05d41aeb0c0 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa05d41aeb0c0
 RBP: ffffb19f0faffd00 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
 R13: ffffa05d41895d80 R14: ffffa0617fc70978 R15: ffffa0617fc70810
 FS:  00007fb2bd043c00(0000) GS:ffffa06c75a40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fb2bc945400 CR3: 000000068f1ac000 CR4: 0000000000750ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? show_regs+0x6d/0x80
  ? __warn+0x89/0x160
  ? iopf_queue_flush_dev+0x2f/0x40
  ? report_bug+0x17e/0x1b0
  ? handle_bug+0x51/0xa0
  ? exc_invalid_op+0x18/0x80
  ? asm_exc_invalid_op+0x1b/0x20
  ? iopf_queue_flush_dev+0x2f/0x40
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? amd_iommu_remove_dev_pasid+0x7d/0x160
  iommu_detach_device_pasid+0x5a/0xa0
  iommu_sva_unbind_device+0x3f/0xa0
  amdxdna_drm_close+0xa5/0x130 [amdxdna]
  drm_file_free+0x1e6/0x260 [drm]
  drm_release+0xc7/0x150 [drm]
  __fput+0x9e/0x2e0
  __fput_sync+0x1c/0x30
  __x64_sys_close+0x3e/0x90
  do_syscall_64+0x5d/0xf0
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? ksys_write+0x73/0x100
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? exit_to_user_mode_prepare+0x39/0x190
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? syscall_exit_to_user_mode+0x37/0x60
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? do_syscall_64+0x6c/0xf0
  ? do_syscall_64+0x6c/0xf0
  ? exc_page_fault+0x94/0x1b0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 RIP: 0033:0x7fb2bc7157c4
 Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 85 0d 0f 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 13
 RSP: 002b:00007ffd629c61e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
 RAX: ffffffffffffffda RBX: 000055808393b6f0 RCX: 00007fb2bc7157c4
 RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000003
 RBP: 000055808393b788 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000558083999a60 R11: 0000000000000202 R12: 00007fb2bc9ff100
 R13: 000055808393b4f0 R14: 0000000100000001 R15: 00005580839336d0
  </TASK>
 ---[ end trace 0000000000000000 ]---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants