Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel gpu top allways killed after kernel 6.10.10 and causing a kernel panic #706

Closed
1 of 2 tasks
meduk0 opened this issue Sep 17, 2024 · 2 comments
Closed
1 of 2 tasks
Labels
bug Something isn't working

Comments

@meduk0
Copy link

meduk0 commented Sep 17, 2024

NVIDIA Open GPU Kernel Modules Version

560.35.03

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

6.11.0-2-cachyos

Kernel Release

6.10.10 -> 6.11.0-2

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

rtx 3050 laptop 6gb

Describe the bug

> sudo intel_gpu_top                                            fish-1 | 0 (0.030s) < 09:34:39
[sudo] password for meduko:
fish: Job 1, 'sudo intel_gpu_top' terminated by signal SIGKILL (Forced quit)
> sudo  dmesg 
[ 2462.104713] BUG: kernel NULL pointer dereference, address: 0000000000000035
[ 2462.104726] #PF: supervisor read access in kernel mode
[ 2462.104731] #PF: error_code(0x0000) - not-present page
[ 2462.104735] PGD 0 P4D 0
[ 2462.104743] Oops: Oops: 0000 [#3] PREEMPT SMP NOPTI
[ 2462.104752] CPU: 15 UID: 0 PID: 25352 Comm: intel_gpu_top Tainted: G     UD    OE      6.11.0-2-cachyos #1 4e8f3da2cd7dfb196c69514fe8b3b64a033481c6
[ 2462.104765] Tainted: [U]=USER, [D]=DIE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 2462.104768] Hardware name: Dell Inc. Dell G15 5530/04TT83, BIOS 1.19.0 08/16/2024
[ 2462.104773] Sched_ext: lavd (enabled+all), task: runnable_at=-3ms
[ 2462.104777] RIP: 0010:clkpm_show+0x47/0x70
[ 2462.104795] Code: 43 48 2d c8 00 00 00 48 8b 50 10 48 8b 42 10 48 85 c0 74 16 48 8b 42 38 48 85 c0 74 0d 80 78 6c 00 74 2a 48 8b 80 b8 00 00 00 <0f> b6 50 35 48 c7 c6 6c c4 a5 b5 83 e2 01 e8 46 ed d0 ff 48 98 c3
[ 2462.104800] RSP: 0018:ffffadf7ce943c48 EFLAGS: 00010202
[ 2462.104807] RAX: 0000000000000000 RBX: ffffffffb6350120 RCX: ffffffffb6350120
[ 2462.104811] RDX: ffff893dc20a4000 RSI: ffffffffb6350120 RDI: ffff893e4d481000
[ 2462.104817] RBP: ffffffffb555a010 R08: ffff893dc20ba0c8 R09: ffff893ed0422600
[ 2462.104821] R10: ffffadf7ce943c80 R11: 0000000000001000 R12: ffffadf7ce943d00
[ 2462.104824] R13: ffffadf7ce943cd8 R14: 0000000000000001 R15: 0000000000000001
[ 2462.104829] FS:  00007846fbe03740(0000) GS:ffff89413f780000(0000) knlGS:0000000000000000
[ 2462.104834] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2462.104839] CR2: 0000000000000035 CR3: 000000026b7b2000 CR4: 0000000000f50ef0
[ 2462.104843] PKRU: 55555554
[ 2462.104847] Call Trace:
[ 2462.104853]  <TASK>
[ 2462.104859]  ? __die_body.cold+0x8/0x12
[ 2462.104870]  ? page_fault_oops+0x15a/0x2e0
[ 2462.104880]  ? exc_page_fault+0x81/0x190
[ 2462.104887]  ? asm_exc_page_fault+0x26/0x30
[ 2462.104899]  ? clkpm_show+0x47/0x70
[ 2462.104905]  dev_attr_show+0x19/0x40
[ 2462.104916]  sysfs_kf_seq_show+0xa8/0xf0
[ 2462.104925]  seq_read_iter+0x11e/0x470
[ 2462.104933]  vfs_read+0x344/0x470
[ 2462.104942]  __x64_sys_read+0x72/0xf0
[ 2462.104950]  do_syscall_64+0x82/0x190
[ 2462.104962]  ? __x64_sys_openat+0x1f5/0x230
[ 2462.104970]  ? syscall_exit_to_user_mode+0x10/0x1e0
[ 2462.104976]  ? do_syscall_64+0x8e/0x190
[ 2462.104983]  ? exc_page_fault+0x81/0x190
[ 2462.104989]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 2462.104996] RIP: 0033:0x7846fc008861
[ 2462.105070] Code: ff ff eb c3 67 e8 5f ba 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d 05 98 0e 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 48 83 ec 28 48 89 54
[ 2462.105075] RSP: 002b:00007ffe58d049e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2462.105082] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007846fc008861
[ 2462.105086] RDX: 0000000000001008 RSI: 00005aae2f56a830 RDI: 0000000000000003
[ 2462.105089] RBP: 00007ffe58d04ae0 R08: 0000000000000001 R09: 000000000000000f
[ 2462.105093] R10: 0000000000000000 R11: 0000000000000246 R12: 00005aae2f56a830
[ 2462.105097] R13: 0000000000001008 R14: 0000000000001008 R15: 0000000000001007
[ 2462.105103]  </TASK>
[ 2462.105106] Modules linked in: ccm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_sof_probes snd_soc_intel_hda_dsp_common snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic dell_pc snd_ctl_led platform_profile soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof intel_uncore_frequency intel_uncore_frequency_common snd_sof_utils snd_soc_hdac_hda intel_tcc_cooling snd_soc_acpi_intel_match x86_pkg_temp_thermal soundwire_generic_allocation intel_powerclamp snd_soc_acpi soundwire_bus coretemp iwlmvm snd_soc_avs kvm_intel snd_soc_hda_codec snd_hda_ext_core mac80211 kvm snd_soc_core snd_compress ac97_bus libarc4 crct10dif_pclmul snd_pcm_dmaengine crc32_pclmul ptp hid_sensor_custom snd_hda_intel polyval_clmulni polyval_generic pps_core snd_intel_dspcfg hid_sensor_hub
[ 2462.105200]  hid_multitouch snd_intel_sdw_acpi ghash_clmulni_intel sha512_ssse3 snd_hda_codec sha1_ssse3 intel_ishtp_hid dell_laptop aesni_intel spd5118 snd_hda_core gf128mul dell_wmi processor_thermal_device_pci crypto_simd snd_hwdep intel_rapl_msr iwlwifi processor_thermal_device cryptd processor_thermal_wt_hint snd_pcm mei_hdcp rapl mei_pxp processor_thermal_rfim joydev vfat dell_smbios mousedev dcdbas fat intel_cstate intel_uncore snd_timer spi_nor dell_wmi_sysman ucsi_acpi processor_thermal_rapl pcspkr dell_smm_hwmon firmware_attributes_class dell_wmi_ddv alienware_wmi psmouse dell_wmi_descriptor wmi_bmof i2c_i801 intel_rapl_common snd typec_ucsi cfg80211 mtd i2c_smbus intel_lpss_pci processor_thermal_wt_req nvidia_wmi_ec_backlight typec i2c_mux soundcore intel_ish_ipc intel_lpss processor_thermal_power_floor processor_thermal_mbox idma64 rfkill roles intel_ishtp intel_pmc_core intel_vsec int3403_thermal i2c_hid_acpi intel_hid pmt_telemetry int3400_thermal ip6t_REJECT i2c_hid int340x_thermal_zone pmt_class
[ 2462.105312]  pinctrl_alderlake acpi_pad sparse_keymap acpi_tad acpi_thermal_rel nf_reject_ipv6 mei_me mei xt_hl mac_hid ip6t_rt lz4 lz4_compress ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables pkcs8_key_parser i2c_dev crypto_user acpi_call(OE) dm_mod loop nfnetlink zram ip_tables x_tables nvidia_uvm(OE) xe gpu_sched drm_suballoc_helper drm_gpuvm drm_exec nvidia_drm(OE) nvidia_modeset(OE) btrfs hid_generic usbhid blake2b_generic libcrc32c crc32c_generic serio_raw xor raid6_pq atkbd nvidia(OE) libps2 nvme vivaldi_fmap nvme_core i915 sha256_ssse3 spi_intel_pci xhci_pci drm_ttm_helper spi_intel nvme_auth xhci_pci_renesas i8042 serio i2c_algo_bit drm_buddy video wmi ttm drm_display_helper cec intel_agp intel_gtt crc32c_intel
[ 2462.105432] CR2: 0000000000000035
[ 2462.105438] ---[ end trace 0000000000000000 ]---
[ 2462.105441] RIP: 0010:clkpm_show+0x47/0x70
[ 2462.105447] Code: 43 48 2d c8 00 00 00 48 8b 50 10 48 8b 42 10 48 85 c0 74 16 48 8b 42 38 48 85 c0 74 0d 80 78 6c 00 74 2a 48 8b 80 b8 00 00 00 <0f> b6 50 35 48 c7 c6 6c c4 a5 b5 83 e2 01 e8 46 ed d0 ff 48 98 c3
[ 2462.105452] RSP: 0018:ffffadf7ce94b9f8 EFLAGS: 00010202
[ 2462.105457] RAX: 0000000000000000 RBX: ffffffffb6350120 RCX: ffffffffb6350120
[ 2462.105461] RDX: ffff893dc20a4000 RSI: ffffffffb6350120 RDI: ffff893dcd387000
[ 2462.105464] RBP: ffffffffb555a010 R08: ffff893dc20ba0c8 R09: ffff893dddb25380
[ 2462.105467] R10: ffffadf7ce94ba30 R11: 0000000000001000 R12: ffffadf7ce94bab0
[ 2462.105471] R13: ffffadf7ce94ba88 R14: 0000000000000001 R15: 0000000000000001
[ 2462.105474] FS:  00007846fbe03740(0000) GS:ffff89413f780000(0000) knlGS:0000000000000000
[ 2462.105479] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2462.105483] CR2: 0000000000000035 CR3: 000000026b7b2000 CR4: 0000000000f50ef0
[ 2462.105487] PKRU: 55555554
[ 2462.105490] note: intel_gpu_top[25352] exited with irqs disabled
prime-run inxi -Fzx                                           fish-1 | 1 (2.289s) < 09:28:03
System:
  Kernel: 6.11.0-2-cachyos arch: x86_64 bits: 64 compiler: gcc v: 14.2.1
  Desktop: KDE Plasma v: 6.1.5 Distro: CachyOS base: Arch Linux
Machine:
  Type: Laptop System: Dell product: Dell G15 5530 v: N/A
    serial: <superuser required>
  Mobo: Dell model: 04TT83 v: A00 serial: <superuser required> UEFI: Dell
    v: 1.19.0 date: 08/16/2024
Battery:
  ID-1: BAT0 charge: 26.6 Wh (65.7%) condition: 40.5/54.9 Wh (73.8%)
    volts: 11.6 min: 11.4 model: BYD DELL DVG8M33 status: not charging
CPU:
  Info: 10-core (6-mt/4-st) model: 13th Gen Intel Core i5-13450HX bits: 64
    type: MST AMCP arch: Raptor Lake rev: 2 cache: L1: 864 KiB L2: 9.5 MiB
    L3: 20 MiB
  Speed (MHz): avg: 800 min/max: 800/2400:1800 cores: 1: 800 2: 800 3: 800
    4: 800 5: 800 6: 800 7: 800 8: 800 9: 800 10: 800 11: 800 12: 800 13: 800
    14: 800 15: 800 16: 800 bogomips: 83558
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel Alder Lake-S [UHD Graphics] vendor: Dell driver: i915
    v: kernel arch: Gen-12.2 bus-ID: 00:02.0
  Device-2: NVIDIA GN20-P0-R-K2 [GeForce RTX 3050 6GB Laptop GPU]
    vendor: Dell driver: nvidia v: 560.35.03 arch: Ampere bus-ID: 01:00.0
  Display: wayland server: X.org v: 1.21.1.13 with: Xwayland v: 24.1.2
    compositor: kwin_wayland driver: N/A resolution: 1920x1080
  API: EGL v: 1.5 drivers: nvidia platforms:
    active: gbm,wayland,x11,surfaceless inactive: N/A
  API: OpenGL v: 4.6.0 vendor: nvidia v: 560.35.03 glx-v: 1.4
    direct-render: yes renderer: NVIDIA GeForce RTX 3050 6GB Laptop
    GPU/PCIe/SSE2
  API: Vulkan v: 1.3.295 drivers: nvidia surfaces: xcb,xlib,wayland
    devices: 1
Audio:
  Device-1: Intel Raptor Lake High Definition Audio vendor: Dell
    driver: sof-audio-pci-intel-tgl bus-ID: 00:1f.3
  API: ALSA v: k6.11.0-2-cachyos status: kernel-api
  Server-1: JACK v: 1.9.22 status: off
  Server-2: PipeWire v: 1.2.3 status: active
Network:
  Device-1: Intel Raptor Lake-S PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 00:14.3
  IF: wlan0 state: up mac: <filter>
Drives:
  Local Storage: total: 476.94 GiB used: 115.86 GiB (24.3%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: BC901 NVMe 512GB
    size: 476.94 GiB temp: 29.9 C
Partition:
  ID-1: / size: 416.02 GiB used: 115.38 GiB (27.7%) fs: btrfs
    dev: /dev/nvme0n1p5
  ID-2: /boot size: 2.33 GiB used: 491.9 MiB (20.7%) fs: vfat
    dev: /dev/nvme0n1p6
  ID-3: /home size: 416.02 GiB used: 115.38 GiB (27.7%) fs: btrfs
    dev: /dev/nvme0n1p5
  ID-4: /var/log size: 416.02 GiB used: 115.38 GiB (27.7%) fs: btrfs
    dev: /dev/nvme0n1p5
  ID-5: /var/tmp size: 416.02 GiB used: 115.38 GiB (27.7%) fs: btrfs
    dev: /dev/nvme0n1p5
Swap:
  ID-1: swap-1 type: zram size: 15.31 GiB used: 1024 KiB (0.0%)
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 34.0 C mobo: 34.0 C sodimm: SODIMM C
  Fan Speeds (rpm): cpu: 0 fan-1: 0
Info:
  Memory: total: 16 GiB note: est. available: 15.31 GiB used: 3.04 GiB (19.9%)
  Processes: 347 Uptime: 34m Init: systemd
  Packages: 1484 Compilers: clang: 18.1.8 gcc: 14.2.1 Shell: prime-run
    inxi: 3.3.35

To Reproduce

boot with 6.10.10 or 6.11.0-2 on both arch linux , endeavors os and cachyos and after proper intel and nvidia open driver installation just launch sudo intel_gpu_top which cause the kernel panic after fish: Job 1, 'sudo intel_gpu_top' terminated by signal SIGKILL (Forced quit) and it persist even after reboot or reinstall (it never happened before )

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

-other than this the peroformance is stable ( for now ) and i usually play battlefield 1 and use blender
it is an old bug that return according to intel_gpu_top

  • close source nvidia driver never worked for this laptop from 490 and going on (only on fedora akmod-nvidia it is working ) that is why i never tick (it happen on closed source nvidia )
@meduk0 meduk0 added the bug Something isn't working label Sep 17, 2024
@mtijanic
Copy link
Collaborator

Hi! Forgive the obvious question, but what happens if the nvidia driver is not loaded (or even installed) and you run the tool?

@meduk0
Copy link
Author

meduk0 commented Sep 17, 2024

sorry man but it turn out that the force_pcme ( a kernel parameter that is causing this issue (the nvidia card doesn't support it and forcing it cause stutter and driver panic for the hybrid power system + longer time to respond from the nvidia driver

@meduk0 meduk0 closed this as completed Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants