Skip to content

VER-847: ath12k: peer lookup failure during STA stats retrieval (tentative fix)#75

Merged
adrian-nicolau merged 2 commits intotg-v6.18-ath12k-nextfrom
VER-847-3.5-FR2-spider-wifi-unit-stop-responding-till-power-cycle
Mar 12, 2026
Merged

VER-847: ath12k: peer lookup failure during STA stats retrieval (tentative fix)#75
adrian-nicolau merged 2 commits intotg-v6.18-ath12k-nextfrom
VER-847-3.5-FR2-spider-wifi-unit-stop-responding-till-power-cycle

Conversation

@adrian-nicolau
Copy link

This PR cherry-picks the latest advancements of upstream ath12k driver. They address issues with ath12k station statistics requests (trace shows problem with ath12k_mac_op_sta_statistics), so they may help resolve the issue we are seeing.

However, we've seen the failed to find the peer with peer_id error in the past, caused by torvalds/linux@981050b and also reported here https://bugzilla.kernel.org/show_bug.cgi?id=221039 . We reverted it in the past, and if the issue still reproduces after this PR we might need to revert it again.

Qualcomm's driver changes the log level of this error to debug in https://git.codelinaro.org/clo/qsdk/oss/src/mac80211/wlan-open/-/commit/c8f276e7615d3156e07e1ed217ca96a2484ada09

[ 1538.267889] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1549.565115] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1550.022450] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1550.081428] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1550.244533] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1550.255377] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1550.445195] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1550.456415] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1550.642817] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1550.653841] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1551.041720] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1551.053444] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1551.260004] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1551.270678] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1566.278552] ath12k_wifi7_pci 0002:01:00.0: dp_tx: failed to find the peer with peer_id 3
[ 1574.911836] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1574.917441] rcu:     1-....: (209637 ticks this GP) idle=819c/1/0x4000000000000000 softirq=11455/11457 fqs=105024
[ 1574.927491] rcu:     (t=210121 jiffies g=13209 q=1984 ncpus=4)
[ 1574.933091] CPU: 1 UID: 0 PID: 567 Comm: FR2Agent0 Tainted: G           O        6.18.0-yocto-standard-g9e46c59c3a0c #1 NONE 
[ 1574.933097] Tainted: [O]=OOT_MODULE
[ 1574.933098] Hardware name: Siklu N366 (DT)
[ 1574.933100] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1574.933104] pc : ath12k_dp_link_peer_find_by_addr+0xe8/0x13c [ath12k]
[ 1574.933121] lr : ath12k_dp_link_peer_find_by_addr+0x11c/0x13c [ath12k]
[ 1574.933135] sp : ffff80008397b4a0
[ 1574.933136] x29: ffff80008397b4a0 x28: ffff80008397b720 x27: ffff00003ac0d000
[ 1574.933142] x26: ffff00002be01e11 x25: ffff00002be01c00 x24: ffff000000de7840
[ 1574.933146] x23: 0000000000000000 x22: ffff00003ad68f40 x21: ffff00003ac0dd30
[ 1574.933151] x20: ffff00002be01e10 x19: ff2001f4fe1802c3 x18: 0000000000000000
[ 1574.933156] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff18035440
[ 1574.933160] x14: ffff800080d0e0a8 x13: 0000000000000060 x12: ffff800080c35d68
[ 1574.933165] x11: ffffffffffffffff x10: ffff800080c35c40 x9 : ffffffffffffffff
[ 1574.933170] x8 : ffff8000811ab310 x7 : 0000000011babbbe x6 : 00007dffc0bf65a0
[ 1574.933175] x5 : 3e54fff200000000 x4 : 0000000011babbfc x3 : 8c12982400000000
[ 1574.933179] x2 : 0000000000000002 x1 : ffff00003ac0dd34 x0 : 0000000000000001
[ 1574.933184] Call trace:
[ 1574.933186]  ath12k_dp_link_peer_find_by_addr+0xe8/0x13c [ath12k] (P)
[ 1574.933200]  ath12k_dp_link_peer_get_sta_rate_info_stats+0x60/0x160 [ath12k]
[ 1574.933214]  ath12k_mac_op_sta_statistics+0xa8/0x3d0 [ath12k]
[ 1574.933228]  sta_set_sinfo+0x260/0xe44
[ 1574.933239]  ieee80211_get_station+0x34/0x7c
[ 1574.933243]  nl80211_get_station+0xbc/0x254
[ 1574.933249]  genl_family_rcv_msg_doit.isra.0+0xb8/0x11c
[ 1574.933254]  genl_rcv_msg+0x1b8/0x238
[ 1574.933258]  netlink_rcv_skb+0x54/0x128
[ 1574.933262]  genl_rcv+0x34/0x48
[ 1574.933265]  netlink_unicast+0x1dc/0x2d4
[ 1574.933272]  netlink_sendmsg+0x168/0x39c
[ 1574.933277]  ____sys_sendmsg+0x20c/0x234
[ 1574.933283]  ___sys_sendmsg+0x7c/0xc0
[ 1574.933287]  __sys_sendmsg+0x70/0xd4
[ 1574.933292]  __arm64_sys_sendmsg+0x20/0x28
[ 1574.933297]  invoke_syscall.constprop.0+0x48/0xc8
[ 1574.933304]  do_el0_svc+0x3c/0xb8
[ 1574.933309]  el0_svc+0x3c/0x150
[ 1574.933315]  el0t_64_sync_handler+0xc8/0xdc
[ 1574.933320]  el0t_64_sync+0x170/0x174

Baochen Qiang added 2 commits March 10, 2026 13:16
To get firmware statistics, currently ar->pdev->pdev_id is passed as an
argument to ath12k_mac_get_fw_stats() in ath12k_mac_op_sta_statistics().
For single pdev device like WCN7850, its value is 0 which represents the
SoC pdev id. As a result, WCN7850 firmware sends the same reply to host
twice, which further results in memory leak:

  unreferenced object 0xffff88812e286000 (size 192):
  comm "softirq", pid 0, jiffies 4294981997
  hex dump (first 32 bytes):
    10 a5 40 11 81 88 ff ff 10 a5 40 11 81 88 ff ff  ..@.......@.....
    00 00 00 00 00 00 00 00 80 ff ff ff 33 05 00 00  ............3...
  backtrace (crc cecc8c82):
    __kmalloc_cache_noprof
    ath12k_wmi_tlv_fw_stats_parse
    ath12k_wmi_tlv_iter
    ath12k_wmi_op_rx
    ath12k_htc_rx_completion_handler
    ath12k_ce_per_engine_service
    ath12k_pci_ce_workqueue
    process_one_work
    bh_worker
    tasklet_action
    handle_softirqs

Detailed explanation is:

  1. ath12k_mac_get_fw_stats() called in ath12k_mac_op_sta_statistics() to
     get vdev statistics, making the caller thread wait.
  2. firmware sends the first reply, ath12k_wmi_tlv_fw_stats_data_parse()
     allocates buffers to cache necessary information. Following that, in
     ath12k_wmi_fw_stats_process() if events of all started vdev haved been
     received, is_end flag is set hence the waiting thread gets waken up by
     the ar->fw_stats_done/->fw_stats_complete signals.
  3. ath12k_mac_get_fw_stats() wakes up and returns successfully.
     ath12k_mac_op_sta_statistics() saves required parameters and calls
     ath12k_fw_stats_reset() to free buffers allocated earlier.
  4. firmware sends the second reply. As usual, buffers are allocated and
     attached to the ar->fw_stats.vdevs list. Note this time there is no
     thread waiting, therefore no chance to free those buffers.
  5. ath12k module gets unloaded. If there has been no more firmware
     statistics request made since step 4, or if the request fails (see
     the example in the following patch), there is no chance to call
     ath12k_fw_stats_reset(). Consequently those buffers leak.

Actually for single pdev device, using SoC pdev id in
ath12k_mac_op_sta_statistics() is wrong, because the purpose is to get
statistics of a specific station, which is mapped to a specific pdev. That
said, the id of actual individual pdev should be fetched and used instead.
The helper ath12k_mac_get_target_pdev_id() serves for this purpose, hence
use it to fix this issue. Note it also works for other devices as well due
to the single_pdev_only check inside.

The same applies to ath12k_mac_op_get_txpower() and
ath12k_mac_op_link_sta_statistics() as well.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: 79e7b04 ("wifi: ath12k: report station mode signal strength")
Fixes: e92c658 ("wifi: ath12k: add get_txpower mac ops")
Fixes: ebebe66 ("wifi: ath12k: fill link station statistics for MLO")
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20260129-ath12k-fw-stats-fixes-v1-1-55d66064f4d5@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
In ath12k_wmi_tlv_fw_stats_data_parse() and
ath12k_wmi_tlv_rssi_chain_parse(), the driver uses
ieee80211_find_sta_by_ifaddr() to look up the station associated with the
incoming firmware statistics. This works under normal conditions but fails
during AP disconnection, resulting in log messages like:

 wlan0: deauthenticating from xxxxxx by local choice (Reason: 3=DEAUTH_LEAVING)
 wlan0: moving STA xxxxxx to state 3
 wlan0: moving STA xxxxxx to state 2
 wlan0: moving STA xxxxxx to state 1
 ath12k_pci 0000:02:00.0: not found station bssid xxxxxx for vdev stat
 ath12k_pci 0000:02:00.0: not found station of bssid xxxxxx for rssi chain
 ath12k_pci 0000:02:00.0: failed to pull fw stats: -71
 ath12k_pci 0000:02:00.0: time out while waiting for get fw stats
 wlan0: Removed STA xxxxxx
 wlan0: Destroyed STA xxxxxx

The failure happens because the station has already been removed from
ieee80211_local::sta_hash by the time firmware statistics are requested
through drv_sta_statistics().

Switch the lookup to ath12k_link_sta_find_by_addr(), which searches the
driver's link station hash table that still has the station recorded
at that time.  This also implicitly fixes another issue: the current code
always uses deflink regardless of which link the statistics belong to,
which is incorrect in MLO scenarios. The new helper returns the correct
link station.

Additionally, raise the log level on lookup failures. With the updated
helper, such failures should no longer occur under normal conditions.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3

Fixes: 79e7b04 ("wifi: ath12k: report station mode signal strength")
Fixes: 6af5bc3 ("wifi: ath12k: report station mode per-chain signal strength")
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Link: https://patch.msgid.link/20260129-ath12k-fw-stats-fixes-v1-2-55d66064f4d5@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
@adrian-nicolau adrian-nicolau merged commit e06c9b0 into tg-v6.18-ath12k-next Mar 12, 2026
1 check passed
@adrian-nicolau adrian-nicolau deleted the VER-847-3.5-FR2-spider-wifi-unit-stop-responding-till-power-cycle branch March 12, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants