-
Notifications
You must be signed in to change notification settings - Fork 103
[WIP] [Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] mm/vmscan: don't try to reclaim hwpoison folio #896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: linux-6.6.y
Are you sure you want to change the base?
[WIP] [Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] mm/vmscan: don't try to reclaim hwpoison folio #896
Conversation
Reviewer's GuideThis PR introduces a folio_contain_hwpoisoned_page helper and integrates it into the vmscan reclaim path so that hwpoisoned folios are unconditionally skipped and unmapped during shrink operations, preventing BUGs when reclaiming poisoned pages. Sequence diagram for handling hwpoisoned folios during shrink_folio_listsequenceDiagram
participant shrink_folio_list
participant folio
participant folio_contain_hwpoisoned_page
participant unmap_poisoned_folio
shrink_folio_list->>folio: folio_trylock(folio)
alt folio is locked
shrink_folio_list->>folio_contain_hwpoisoned_page: check if folio is hwpoisoned
alt folio is hwpoisoned
shrink_folio_list->>unmap_poisoned_folio: unmap_poisoned_folio(folio, folio_pfn(folio), false)
shrink_folio_list->>folio: folio_unlock(folio)
shrink_folio_list->>folio: folio_put(folio)
Note right of shrink_folio_list: Continue to next folio
else folio is not hwpoisoned
shrink_folio_list->>folio: continue normal reclaim
end
else folio is not locked
shrink_folio_list->>shrink_folio_list: keep folio
end
Class diagram for folio_contain_hwpoisoned_page helper and vmscan changesclassDiagram
class folio {
+bool folio_test_hwpoison()
+bool folio_test_large()
+bool folio_test_has_hwpoisoned()
+void folio_unlock()
+void folio_put()
}
class shrink_folio_list {
+unsigned int shrink_folio_list(struct list_head *folio_list, ...)
}
class folio_contain_hwpoisoned_page {
+bool folio_contain_hwpoisoned_page(struct folio *folio)
}
class unmap_poisoned_folio {
+void unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool)
}
folio_contain_hwpoisoned_page --|> folio : uses
shrink_folio_list --|> folio_contain_hwpoisoned_page : calls
shrink_folio_list --|> unmap_poisoned_folio : calls
shrink_folio_list --|> folio : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
deepin pr auto review关键摘要:
是否建议立即修改:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a kernel BUG by skipping and unmapping hwpoisoned folios during memory reclamation in the vmscan subsystem.
- Skip hwpoisoned folios in shrink_folio_list to prevent VM_BUG_ON errors.
- Introduce the folio_contain_hwpoisoned_page helper and corresponding unmap call.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| mm/vmscan.c | Adds a check to unmap and release hwpoisoned folios during shrink_folio_list. |
| include/linux/page-flags.h | Introduces the folio_contain_hwpoisoned_page helper to detect hwpoison conditions. |
Comments suppressed due to low confidence (2)
mm/vmscan.c:1744
- Consider adding an inline comment to explain why unmapping and releasing the folio is necessary for hwpoisoned pages in this reclaim path.
if (folio_contain_hwpoisoned_page(folio)) {
include/linux/page-flags.h:1042
- [nitpick] Consider renaming the function to 'folio_contains_hwpoisoned_page' for improved grammatical clarity.
static inline bool folio_contain_hwpoisoned_page(struct folio *folio)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @opsiff - I've reviewed your changes - here's some feedback:
- Rename folio_contain_hwpoisoned_page to folio_contains_hwpoisoned_page to match the common ‘contains’ naming convention.
- Add a brief comment above the hwpoisoned folio branch in shrink_folio_list to explain why we unmap-and-skip these folios for future maintainers.
- Consider moving the folio_contain_hwpoisoned_page helper into the mm/hwpoison subsystem (or a more relevant header) alongside other hwpoison utilities.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Rename folio_contain_hwpoisoned_page to folio_contains_hwpoisoned_page to match the common ‘contains’ naming convention.
- Add a brief comment above the hwpoisoned folio branch in shrink_folio_list to explain why we unmap-and-skip these folios for future maintainers.
- Consider moving the folio_contain_hwpoisoned_page helper into the mm/hwpoison subsystem (or a more relevant header) alongside other hwpoison utilities.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
a88abcd to
1995645
Compare
Patch series "Unify vma_address and vma_pgoff_address". The current vma_address() pretends that the ambiguity between head & tail page is an advantage. If you pass a head page to vma_address(), it will operate on all pages in the folio, while if you pass a tail page, it will operate on a single page. That's not what any of the callers actually want, so first convert all callers to use vma_pgoff_address() and then rename vma_pgoff_address() to vma_address(). This patch (of 3): If 'page' is the first page of a large folio then vma_address() will scan for any page in the entire folio. This can lead to page_mapped_in_vma() returning true if some of the tail pages are mapped and the head page is not. This could lead to memory failure choosing to kill a task unnecessarily. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 7e83474)
Convert the three remaining callers to call vma_pgoff_address() directly. This removes an ambiguity where we'd check just one page if passed a tail page and all N pages if passed a head page. Also add better kernel-doc for vma_pgoff_address(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 412ad5f)
With all callers converted, we can use the nice shorter name. Take this opportunity to reorder the arguments to the logical order (larger object first). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit e0abfbb)
Patch series "Some cleanups for memory-failure", v3. A lot of folio conversions, plus some other simplifications. This patch (of 11): Unify the KSM and DAX codepaths by calculating the addr in add_to_kill_fsdax() instead of telling __add_to_kill() to calculate it. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Miaohe Lin <[email protected]> Reviewed-by: Jane Chu <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 1c0501e)
Handle anon/file folios the same way as KSM & DAX folios by passing in the address. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Miaohe Lin <[email protected]> Reviewed-by: Jane Chu <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit f2b3719)
The only user of this function calls page_address_in_vma() immediately after page_mapped_in_vma() calculates it and uses it to return true/false. Return the address instead, allowing memory-failure to skip the call to page_address_in_vma(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Miaohe Lin <[email protected]> Reviewed-by: Jane Chu <[email protected]> Cc: Dan Williams <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 37bc2ff)
This function is only currently used by the memory-failure code, so we can omit it if we're not compiling in the memory-failure code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Suggested-by: Miaohe Lin <[email protected]> Reviewed-by: Jane Chu <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit b87f978)
Removes two calls to compound_head(). Move the prototype to internal.h; we definitely don't want code outside mm using it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jane Chu <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit fed5348)
The page is only used to get the mapping, so the folio will do just as well. Both callers already have a folio available, so this saves a call to compound_head(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jane Chu <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 6e8cda4)
Saves dozens of calls to compound_head(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jane Chu <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 5dba5c3)
Pass the folio from the callers, and use it throughout instead of hpage. Saves dozens of calls to compound_head(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Miaohe Lin <[email protected]> Reviewed-by: Jane Chu <[email protected]> Cc: Dan Williams <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 03468a0)
Some of these folio APIs didn't exist when the unpoison_memory() conversion was done originally. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Miaohe Lin <[email protected]> Reviewed-by: Jane Chu <[email protected]> Cc: Dan Williams <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit ee299e9)
Saves a couple of calls to compound_head(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jane Chu <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 0edb5b2)
We've already calculated it, so pass it in instead of recalculating it in collect_procs_ksm(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Jane Chu <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit b650e1d)
Patch series "mm: memory_hotplug: improve do_migrate_range()", v3. Unify hwpoisoned page handling and isolation of HugeTLB/LRU/non-LRU movable page, also convert to use folios in do_migrate_range(). This patch (of 5): Directly use a folio for HugeTLB and THP when calculate the next pfn, then remove unused head variable. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit b62b51d)
Add unmap_poisoned_folio() helper which will be reused by do_migrate_range() from memory hotplug soon. [[email protected]: whitespace tweak, per Miaohe Lin] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 16038c4)
mainline inclusion from mainline-v6.15-rc1 category: bugfix commit 1b04495 upstream. Syzkaller reports a bug as follows: Injecting memory failure for pfn 0x18b00e at process virtual address 0x20ffd000 Memory failure: 0x18b00e: dirty swapcache page still referenced by 2 users Memory failure: 0x18b00e: recovery action for dirty swapcache page: Failed page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd pfn:0x18b00e memcg:ffff0000dd6d9000 anon flags: 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff) raw: 005ffffe00482011 dead000000000100 dead000000000122 ffff0000e232a7c9 raw: 0000000000020ffd 0000000000000000 00000002ffffffff ffff0000dd6d9000 page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio)) ------------[ cut here ]------------ kernel BUG at mm/swap_state.c:184! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP Modules linked in: CPU: 0 PID: 60 Comm: kswapd0 Not tainted 6.6.0-gcb097e7de84e #3 Hardware name: linux,dummy-virt (DT) pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : add_to_swap+0xbc/0x158 lr : add_to_swap+0xbc/0x158 sp : ffff800087f37340 x29: ffff800087f37340 x28: fffffc00052c0380 x27: ffff800087f37780 x26: ffff800087f37490 x25: ffff800087f37c78 x24: ffff800087f377a0 x23: ffff800087f37c50 x22: 0000000000000000 x21: fffffc00052c03b4 x20: 0000000000000000 x19: fffffc00052c0380 x18: 0000000000000000 x17: 296f696c6f662865 x16: 7461646f7470755f x15: 747365745f6f696c x14: 6f6621284f494c4f x13: 0000000000000001 x12: ffff600036d8b97b x11: 1fffe00036d8b97a x10: ffff600036d8b97a x9 : dfff800000000000 x8 : 00009fffc9274686 x7 : ffff0001b6c5cbd3 x6 : 0000000000000001 x5 : ffff0000c25896c0 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff0000c25896c0 x0 : 0000000000000000 Call trace: add_to_swap+0xbc/0x158 shrink_folio_list+0x12ac/0x2648 shrink_inactive_list+0x318/0x948 shrink_lruvec+0x450/0x720 shrink_node_memcgs+0x280/0x4a8 shrink_node+0x128/0x978 balance_pgdat+0x4f0/0xb20 kswapd+0x228/0x438 kthread+0x214/0x230 ret_from_fork+0x10/0x20 I can reproduce this issue with the following steps: 1) When a dirty swapcache page is isolated by reclaim process and the page isn't locked, inject memory failure for the page. me_swapcache_dirty() clears uptodate flag and tries to delete from lru, but fails. Reclaim process will put the hwpoisoned page back to lru. 2) The process that maps the hwpoisoned page exits, the page is deleted the page will never be freed and will be in the lru forever. 3) If we trigger a reclaim again and tries to reclaim the page, add_to_swap() will trigger VM_BUG_ON_FOLIO due to the uptodate flag is cleared. To fix it, skip the hwpoisoned page in shrink_folio_list(). Besides, the hwpoison folio may not be unmapped by hwpoison_user_mappings() yet, unmap it in shrink_folio_list(), otherwise the folio will fail to be unmaped by hwpoison_user_mappings() since the folio isn't in lru list. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jinjiang Tu <[email protected]> Acked-by: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: <stable@vger,kernel.org> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 1b04495) [Guan Wentao: add helper from commit ("mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper")] Signed-off-by: Wentao Guan <[email protected]>
1995645 to
84a1e3b
Compare
mainline inclusion
from mainline-v6.15-rc1
category: bugfix
commit 1b04495 upstream.
Syzkaller reports a bug as follows:
Injecting memory failure for pfn 0x18b00e at process virtual address 0x20ffd000 Memory failure: 0x18b00e: dirty swapcache page still referenced by 2 users Memory failure: 0x18b00e: recovery action for dirty swapcache page: Failed page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd pfn:0x18b00e memcg:ffff0000dd6d9000
anon flags: 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff) raw: 005ffffe00482011 dead000000000100 dead000000000122 ffff0000e232a7c9 raw: 0000000000020ffd 0000000000000000 00000002ffffffff ffff0000dd6d9000 page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio)) ------------[ cut here ]------------
kernel BUG at mm/swap_state.c:184!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP Modules linked in:
CPU: 0 PID: 60 Comm: kswapd0 Not tainted 6.6.0-gcb097e7de84e #3 Hardware name: linux,dummy-virt (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : add_to_swap+0xbc/0x158
lr : add_to_swap+0xbc/0x158
sp : ffff800087f37340
x29: ffff800087f37340 x28: fffffc00052c0380 x27: ffff800087f37780 x26: ffff800087f37490 x25: ffff800087f37c78 x24: ffff800087f377a0 x23: ffff800087f37c50 x22: 0000000000000000 x21: fffffc00052c03b4 x20: 0000000000000000 x19: fffffc00052c0380 x18: 0000000000000000 x17: 296f696c6f662865 x16: 7461646f7470755f x15: 747365745f6f696c x14: 6f6621284f494c4f x13: 0000000000000001 x12: ffff600036d8b97b x11: 1fffe00036d8b97a x10: ffff600036d8b97a x9 : dfff800000000000 x8 : 00009fffc9274686 x7 : ffff0001b6c5cbd3 x6 : 0000000000000001 x5 : ffff0000c25896c0 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffff0000c25896c0 x0 : 0000000000000000 Call trace:
add_to_swap+0xbc/0x158
shrink_folio_list+0x12ac/0x2648
shrink_inactive_list+0x318/0x948
shrink_lruvec+0x450/0x720
shrink_node_memcgs+0x280/0x4a8
shrink_node+0x128/0x978
balance_pgdat+0x4f0/0xb20
kswapd+0x228/0x438
kthread+0x214/0x230
ret_from_fork+0x10/0x20
I can reproduce this issue with the following steps:
When a dirty swapcache page is isolated by reclaim process and the
page isn't locked, inject memory failure for the page.
me_swapcache_dirty() clears uptodate flag and tries to delete from lru,
but fails. Reclaim process will put the hwpoisoned page back to lru.
The process that maps the hwpoisoned page exits, the page is deleted
the page will never be freed and will be in the lru forever.
If we trigger a reclaim again and tries to reclaim the page,
add_to_swap() will trigger VM_BUG_ON_FOLIO due to the uptodate flag is
cleared.
To fix it, skip the hwpoisoned page in shrink_folio_list(). Besides, the hwpoison folio may not be unmapped by hwpoison_user_mappings() yet, unmap it in shrink_folio_list(), otherwise the folio will fail to be unmaped by hwpoison_user_mappings() since the folio isn't in lru list.
Link: https://lkml.kernel.org/r/[email protected]
Acked-by: Miaohe Lin [email protected]
Cc: David Hildenbrand [email protected]
Cc: Kefeng Wang [email protected]
Cc: Nanyong Sun [email protected]
Cc: Naoya Horiguchi [email protected]
Cc: <stable@vger,kernel.org>
(cherry picked from commit 1b04495) [Guan Wentao: add helper from commit ("mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper")]
Summary by Sourcery
Skip and unmap hwpoisoned folios during reclaim to avoid VM_BUG_ON crashes on poisoned pages and introduce a helper to detect hwpoisoned folios
Bug Fixes:
Enhancements: