Skip to content

Conversation

@opsiff
Copy link
Member

@opsiff opsiff commented Nov 27, 2025

deepin inclusion
category: performance

Disable LSE (Large System Extensions) atomic instructions on some systems to improve performance of per-CPU atomic operations. LSE atomics can exhibit significant overhead on certain microarchitectures (e.g., TSV110) due
to "far atomic" implementations bypassing L1 cache. LL/SC (Load-Link/Store-Conditional) is substantially faster.

The default value is 0 (enabled), which automatically disables LSE on some systems. Set to 1 to skip the check enablement on our test systems regardless of performance impact.

When this feature is active, the kernel logs:
"LSE atomics: use llsc for performance, use lse_disable_check=1 to disable the feature."

PS:
Test with byte-unixbench6 in kp920 24c and 64GB memory, improve whole scores by 3.8%.

Test with https://github.com/leitao/debug/tree/main/LSE percpu_bench CPU0:
LSE (stadd) (c 0, d 0): p50: 029.00 ns p95: 029.09 ns p99: 029.12 ns LL/SC (c 0, d 0): p50: 006.57 ns p95: 006.57 ns p99: 006.57 ns LDADD (c 0, d 0): p50: 069.55 ns p95: 069.57 ns p99: 069.71 ns CPU1:
LSE (stadd) (c 0, d 0): p50: 005.79 ns p95: 029.00 ns p99: 029.01 ns LL/SC (c 0, d 0): p50: 006.56 ns p95: 006.58 ns p99: 006.58 ns LDADD (c 0, d 0): p50: 010.04 ns p95: 010.06 ns p99: 010.06 ns CPU2:
LSE (stadd) (c 0, d 0): p50: 005.79 ns p95: 005.79 ns p99: 005.79 ns LL/SC (c 0, d 0): p50: 006.57 ns p95: 006.57 ns p99: 006.57 ns LDADD (c 0, d 0): p50: 069.53 ns p95: 069.56 ns p99: 069.58 ns ...
CPU23:
LSE (stadd) (c 0, d 0): p50: 005.79 ns p95: 005.79 ns p99: 005.79 ns LL/SC (c 0, d 0): p50: 006.57 ns p95: 006.57 ns p99: 006.57 ns LDADD (c 0, d 0): p50: 064.93 ns p95: 064.95 ns p99: 064.97 ns

Link: https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop [1]
Link: #1320
Link: #1302

(cherry picked from commit 6afecf6) Conflicts:
Documentation/admin-guide/kernel-parameters.txt

Summary by Sourcery

Gate enabling of ARM64 LSE atomic instructions behind a CPU-specific performance check and optional boot-time override to prefer LL/SC atomics on affected systems.

New Features:

  • Add a boot parameter to control whether LSE atomics are conditionally disabled based on CPU model.

Enhancements:

  • Introduce a CPU model blacklist (e.g., HiSilicon TSV110) to automatically disable LSE atomics when they are slower than LL/SC, improving per-CPU atomic performance on those systems.

@sourcery-ai
Copy link

sourcery-ai bot commented Nov 27, 2025

Reviewer's Guide

Gate ARM64 LSE atomic instruction support behind a runtime CPU-model check that can automatically disable LSE on known-slow microarchitectures, controlled by a new boot-time kernel parameter, and wire this check into the existing ARM64 LSE capability detection and documentation.

Sequence diagram for ARM64 LSE capability check with lse_disable_check parameter

sequenceDiagram
    actor Admin
    participant Bootloader
    participant Kernel
    participant arm64_lse_disable_check
    participant has_lse_capability_check
    participant has_cpuid_feature

    Admin->>Bootloader: Configure kernel cmdline (lse_disable_check=0 or 1)
    Bootloader->>Kernel: Pass kernel cmdline

    Kernel->>arm64_lse_disable_check: early_param lse_disable_check
    arm64_lse_disable_check->>Kernel: Set lse_disable_check (default 0 if absent)

    Kernel->>has_lse_capability_check: Evaluate ARM64_HAS_LSE_ATOMICS
    has_lse_capability_check->>has_lse_capability_check: Read lse_disable_check
    has_lse_capability_check->>has_lse_capability_check: Check CPU midr in lse_disable_list
    alt lse_disable_check == 0 and CPU in lse_disable_list
        has_lse_capability_check-->>Kernel: Return false (disable LSE)
        Kernel->>Kernel: Log info "LSE atomics: use llsc for performance..." (system scope)
    else lse_disable_check == 1 or CPU not in list
        has_lse_capability_check->>has_cpuid_feature: Delegate capability test
        has_cpuid_feature-->>has_lse_capability_check: Return CPUID-based result
        has_lse_capability_check-->>Kernel: Return CPUID-based result
    end

    Kernel->>Kernel: Configure atomic implementation (LSE or LL/SC)
Loading

File-Level Changes

Change Details Files
Add a runtime CPU-model-based check and boot-time parameter to optionally disable ARM64 LSE atomics on specific microarchitectures while reusing the existing capability detection path.
  • Introduce a __read_mostly global flag and an early_param handler to parse the new lse_disable_check= boot argument into a boolean.
  • Implement has_lse_capability_check(), which maintains a midr_range allowlist of CPUs where LSE is slower than LL/SC (currently including HISI TSV110), and returns false for those CPUs unless the override parameter is set.
  • Emit an informational pr_info message once at system scope when LSE is auto-disabled for performance, guiding users to lse_disable_check=1 to revert the behavior.
  • Replace the LSE capability .matches callback in the arm64_features[] table to use has_lse_capability_check() instead of the generic has_cpuid_feature(), ensuring the new policy integrates with the normal feature detection flow.
arch/arm64/kernel/cpufeature.c
Document the new lse_disable_check boot-time kernel parameter for controlling LSE atomic behavior on ARM64.
  • Describe the purpose and expected values of lse_disable_check, including its default behavior and effect on LSE atomics for certain CPUs.
  • Mention the associated kernel log message so users can correlate runtime behavior with the parameter.
Documentation/admin-guide/kernel-parameters.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The logic around lse_disable_check is a bit confusing: the variable name, the default value of 0, the in-code comment (/* Disable LSE when lse_disable_check is enabled */), and the pr_info message don’t all line up with what the code actually does; consider inverting the boolean or renaming it and updating the comment/log message so it’s clear that 1 means “skip the auto-disable check and keep LSE enabled.”
  • Since lse_disable_check is only mutated during early boot via early_param and then treated as read-only, it would be more appropriate to use __ro_after_init instead of __read_mostly for stronger protection and clearer intent.
  • The lse_disable_list could be moved to file scope as a static const table rather than being re-declared on each call to has_lse_capability_check, which would make it easier to extend and slightly reduce per-call overhead.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The logic around `lse_disable_check` is a bit confusing: the variable name, the default value of `0`, the in-code comment (`/* Disable LSE when lse_disable_check is enabled */`), and the `pr_info` message don’t all line up with what the code actually does; consider inverting the boolean or renaming it and updating the comment/log message so it’s clear that `1` means “skip the auto-disable check and keep LSE enabled.”
- Since `lse_disable_check` is only mutated during early boot via `early_param` and then treated as read-only, it would be more appropriate to use `__ro_after_init` instead of `__read_mostly` for stronger protection and clearer intent.
- The `lse_disable_list` could be moved to file scope as a `static const` table rather than being re-declared on each call to `has_lse_capability_check`, which would make it easier to extend and slightly reduce per-call overhead.

## Individual Comments

### Comment 1
<location> `Documentation/admin-guide/kernel-parameters.txt:3261` </location>
<code_context>
+			(Load-Link/Store-Conditional) is substantially faster.
+
+			The default value is 0 (enabled), which automatically
+			disables LSE on some systems. Set to 1 to bypassing
+			the automatic disabling of LSE on affected systems.
+
</code_context>

<issue_to_address>
**issue (typo):** Fix the grammatical issue in "Set to 1 to bypassing"

Please change "Set to 1 to bypassing the automatic disabling of LSE on affected systems" to "Set to 1 to bypass the automatic disabling of LSE on affected systems."
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

deepin inclusion
category: performance

Disable LSE (Large System Extensions) atomic instructions
on some systems to improve performance of per-CPU atomic
operations. LSE atomics can exhibit significant overhead
on certain microarchitectures (e.g., TSV110) due
to "far atomic" implementations	bypassing L1 cache. LL/SC
(Load-Link/Store-Conditional) is substantially faster.

The default value is 0 (enabled), which automatically
disables LSE on some systems. Set to 1 to skip the check
enablement on our test systems regardless of performance
impact.

When this feature is active, the kernel logs:
"LSE atomics: use llsc for performance, use lse_disable_check=1 to disable the feature."

PS:
Test with byte-unixbench6 in kp920 24c and 64GB memory,
improve whole scores by 3.8%.

Test with https://github.com/leitao/debug/tree/main/LSE percpu_bench
CPU0:
LSE (stadd) (c 0, d 0): p50: 029.00 ns p95: 029.09 ns p99: 029.12 ns
LL/SC (c 0, d 0): p50: 006.57 ns p95: 006.57 ns p99: 006.57 ns
LDADD (c 0, d 0): p50: 069.55 ns p95: 069.57 ns p99: 069.71 ns
CPU1:
LSE (stadd) (c 0, d 0): p50: 005.79 ns p95: 029.00 ns p99: 029.01 ns
LL/SC (c 0, d 0): p50: 006.56 ns p95: 006.58 ns p99: 006.58 ns
LDADD (c 0, d 0): p50: 010.04 ns p95: 010.06 ns p99: 010.06 ns
CPU2:
LSE (stadd) (c 0, d 0): p50: 005.79 ns p95: 005.79 ns p99: 005.79 ns
LL/SC (c 0, d 0): p50: 006.57 ns p95: 006.57 ns p99: 006.57 ns
LDADD (c 0, d 0): p50: 069.53 ns p95: 069.56 ns p99: 069.58 ns
...
CPU23:
LSE (stadd) (c 0, d 0): p50: 005.79 ns p95: 005.79 ns p99: 005.79 ns
LL/SC (c 0, d 0): p50: 006.57 ns p95: 006.57 ns p99: 006.57 ns
LDADD (c 0, d 0): p50: 064.93 ns p95: 064.95 ns p99: 064.97 ns

Link: https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop [1]
Link: deepin-community#1320
Link: deepin-community#1302
Signed-off-by: Wentao Guan <[email protected]>
(cherry picked from commit 6afecf6)
Conflicts:
	Documentation/admin-guide/kernel-parameters.txt
@opsiff opsiff force-pushed the linux-6.12.y-2025-11-27-midr-lse branch from 192b2af to 2e0e00f Compare November 27, 2025 11:13
@deepin-ci-robot
Copy link

deepin pr auto review

我来对这个Git diff进行详细的代码审查:

  1. 语法逻辑审查:
  • 代码语法正确,函数定义和变量声明符合Linux内核编码规范
  • 使用了__read_mostly属性优化变量访问
  • 正确使用了early_param来处理早期启动参数
  • 函数has_lse_capability_check的逻辑清晰,条件判断合理
  1. 代码质量审查:
  • 优点:
    • 添加了详细的文档说明
    • 使用了MIDR宏来定义CPU型号列表,便于维护
    • 日志信息清晰,提供了用户可操作的提示
  • 可改进点:
    • lse_disable_check变量名可能不够直观,建议改为force_enable_lse更符合实际功能
    • 函数has_lse_capability_check的参数cap未使用,可以考虑删除或添加注释说明
  1. 代码性能审查:
  • 使用__read_mostly属性优化了变量访问
  • 使用静态数组存储CPU列表,避免动态内存分配
  • is_midr_in_range_list函数提供了高效的CPU ID匹配
  • 建议在lse_disable_check为1时可以提前返回,避免不必要的CPU ID检查
  1. 代码安全审查:
  • 使用kstrtobool安全地转换字符串参数
  • 适当的作用域控制,使用static限制变量和函数可见性
  • 通过SCOPE_SYSTEM检查确保日志只在系统级别打印一次

改进建议:

static bool force_enable_lse __read_mostly;

static int __init arm64_force_enable_lse(char *str)
{
    return kstrtobool(str, &force_enable_lse);
}
early_param("lse_disable_check", arm64_force_enable_lse);

static bool has_lse_capability_check(const struct arm64_cpu_capabilities *cap,
                                     int scope)
{
    /* List of CPUs where LSE is slower than LL/SC */
    static const struct midr_range lse_disable_list[] = {
        MIDR_ALL_VERSIONS(MIDR_HISI_TSV110),
        { /* sentinel */ }
    };

    /* If LSE is forced enabled, skip the check */
    if (force_enable_lse)
        return has_cpuid_feature(cap, scope);

    /* Check if current CPU is in the disable list */
    if (is_midr_in_range_list(read_cpuid_id(), lse_disable_list)) {
        if (scope == SCOPE_SYSTEM)
            pr_info("LSE atomics: use llsc for performance, use lse_disable_check=1 to disable the feature.\n");
        return false;
    }

    return has_cpuid_feature(cap, scope);
}

这些改进主要是:

  1. 重命名变量使其更直观
  2. 添加早期返回优化性能
  3. 改进注释使其更清晰
  4. 保持原有功能的同时提高代码可读性

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a mechanism to selectively disable LSE (Large System Extensions) atomic instructions on ARM64 systems where LL/SC (Load-Link/Store-Conditional) operations perform better than LSE atomics. The change addresses performance issues on certain microarchitectures (e.g., HiSilicon TSV110) where LSE "far atomic" implementations bypass L1 cache, resulting in significant overhead for per-CPU atomic operations.

Key Changes:

  • Add runtime CPU model detection to conditionally disable LSE atomics on affected systems
  • Introduce lse_disable_check boot parameter to override automatic LSE disabling behavior
  • Document the new parameter with usage guidelines and performance context

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
arch/arm64/kernel/cpufeature.c Implements has_lse_capability_check() function to detect affected CPU models (TSV110) and conditionally disable LSE atomics based on boot parameter and CPU blacklist
Documentation/admin-guide/kernel-parameters.txt Documents the new lse_disable_check boot parameter including default behavior, usage examples, and performance rationale

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

early_param("lse_disable_check", arm64_lse_disable_check);

static bool has_lse_capability_check(const struct arm64_cpu_capabilities *cap,
int scope)
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation: The function parameters should be aligned properly. The second line should use tabs to align with the opening parenthesis of the first parameter, not excessive spaces. This is inconsistent with the coding style used elsewhere in the file (see has_useable_gicv3_cpuif at line 1687 for reference).

Suggested change
int scope)
int scope)

Copilot uses AI. Check for mistakes.
static bool has_lse_capability_check(const struct arm64_cpu_capabilities *cap,
int scope)
{
/* List of CPUs that LSE are slow more than llsc */
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: "LSE are slow" should be "LSE is slower" or "LSE atomics are slower". The comment is grammatically incorrect as written.

Suggested change
/* List of CPUs that LSE are slow more than llsc */
/* List of CPUs where LSE atomics are slower than llsc */

Copilot uses AI. Check for mistakes.
};

/* Disable LSE when lse_disable_check is 0 and in lse_disable_list */
if (lse_disable_check == 0 && is_midr_in_range_list(read_cpuid_id(), lse_disable_list)) {
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comparison lse_disable_check == 0 should use !lse_disable_check for consistency with idiomatic C boolean checks. This is more readable and follows common kernel coding patterns.

Suggested change
if (lse_disable_check == 0 && is_midr_in_range_list(read_cpuid_id(), lse_disable_list)) {
if (!lse_disable_check && is_midr_in_range_list(read_cpuid_id(), lse_disable_list)) {

Copilot uses AI. Check for mistakes.
Comment on lines +3260 to +3263
The default value is 0 (enabled), which automatically
disables LSE on some systems. Set to 1 to bypass
the automatic disabling of LSE on affected systems.

Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is confusing. Line 3260 states "default value is 0 (enabled)" but the variable name is lse_disable_check, and when it's 0 (false), the code disables LSE on affected CPUs. The phrasing should clarify what is being "enabled" - the automatic disabling feature, not LSE itself. Consider rewording to: "The default value is 0, which enables automatic disabling of LSE on affected systems. Set to 1 to skip this check and keep LSE enabled regardless of CPU model."

Suggested change
The default value is 0 (enabled), which automatically
disables LSE on some systems. Set to 1 to bypass
the automatic disabling of LSE on affected systems.
The default value is 0, which enables automatic disabling of LSE on affected systems.
Set to 1 to skip this check and keep LSE enabled regardless of CPU model.

Copilot uses AI. Check for mistakes.
Comment on lines +1660 to +1666
static bool lse_disable_check __read_mostly;

static int __init arm64_lse_disable_check(char *str)
{
return kstrtobool(str, &lse_disable_check);
}
early_param("lse_disable_check", arm64_lse_disable_check);
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The parameter name lse_disable_check is confusing and has inverted logic. When set to 1, it actually enables LSE (by disabling the check that would disable LSE). A clearer name would be lse_force_enable or lse_skip_check, where the meaning is more intuitive (1 = force enable LSE, 0 = allow automatic disabling). The current name requires double-negative reasoning ("disable" the "disable check").

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants