Skip to content

[WPD]: Apply speculative WPD in non-lto mode. #145031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions clang/docs/UsersManual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2275,9 +2275,13 @@ are listed below.

.. option:: -fwhole-program-vtables

In LTO mode:
Enable whole-program vtable optimizations, such as single-implementation
devirtualization and virtual constant propagation, for classes with
:doc:`hidden LTO visibility <LTOVisibility>`. Requires ``-flto``.
:doc:`hidden LTO visibility <LTOVisibility>`.
In non-LTO mode:
Enables speculative devirtualization only without other features.
Doesn't require ``-flto`` or visibility.

.. option:: -f[no]split-lto-unit

Expand Down Expand Up @@ -5170,7 +5174,7 @@ Execute ``clang-cl /?`` to see a list of supported options:
-fstandalone-debug Emit full debug info for all types used by the program
-fstrict-aliasing Enable optimizations based on strict aliasing rules
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
-fwhole-program-vtables Enables whole-program vtable optimization.
-gcodeview-ghash Emit type record hashes in a .debug$H section
-gcodeview Generate CodeView debug information
-gline-directives-only Emit debug line info directives only
Expand Down
1 change: 1 addition & 0 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -902,6 +902,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
// non-integrated assemblers don't recognize .cgprofile section.
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
PTO.WholeProgramDevirt = CodeGenOpts.WholeProgramVTables;

LoopAnalysisManager LAM;
FunctionAnalysisManager FAM;
Expand Down
3 changes: 2 additions & 1 deletion clang/lib/CodeGen/CGVTables.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1359,7 +1359,8 @@ void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
// Emit type metadata on vtables with LTO or IR instrumentation.
// In IR instrumentation, the type metadata is used to find out vtable
// definitions (for type profiling) among all global variables.
if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
!getCodeGenOpts().WholeProgramVTables)
return;

CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());
Expand Down
8 changes: 6 additions & 2 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7847,8 +7847,12 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
IsDeviceOffloadAction ? D.getLTOMode() : D.getOffloadLTOMode();
auto OtherIsUsingLTO = OtherLTOMode != LTOK_None;

if ((!IsUsingLTO && !OtherIsUsingLTO) ||
(IsPS4 && !UnifiedLTO && (D.getLTOMode() != LTOK_Full)))
if (!IsUsingLTO && !OtherIsUsingLTO && !UnifiedLTO) {
if (const Arg *A = Args.getLastArg(options::OPT_O_Group))
if (!A->getOption().matches(options::OPT_O0))
CmdArgs.push_back("-fwhole-program-vtables");
} else if ((!IsUsingLTO && !OtherIsUsingLTO) ||
(IsPS4 && !UnifiedLTO && (D.getLTOMode() != LTOK_Full)))
D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fwhole-program-vtables"
<< ((IsPS4 && !UnifiedLTO) ? "-flto=full" : "-flto");
Expand Down
56 changes: 56 additions & 0 deletions clang/test/CodeGenCXX/devirt-single-impl.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
// Check that speculative devirtualization works without the need for LTO or visibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we don't test that LLVM optimizations are applied in clang tests. Better to split this into 2 tests:

  • One in clang that checks that the expected type metadata and type tests are inserted (e.g. you might want to modify clang/test/CodeGenCXX/type-metadata.cpp.
  • One in LLVM that consumes IR containing the necessary metadata & type tests, and performs the optimization. E.g. in llvm/test/Transforms/WholeProgramDevirt, using opt with either the necessary set of pass options and/or the non-LTO pass pipeline (ideally try both of these opt invocations).

// RUN: %clang_cc1 -fwhole-program-vtables -O1 %s -emit-llvm -o - | FileCheck %s

struct A {
A(){}
__attribute__((noinline))
virtual int virtual1(){return 20;}
__attribute__((noinline))
virtual void empty_virtual(){}
};

struct B : A {
B(){}
__attribute__((noinline))
virtual int virtual1() override {return 50;}
__attribute__((noinline))
virtual void empty_virtual() override {}
};

// Test that we can apply speculative devirtualization
// without the need for LTO or visibility.
__attribute__((noinline))
int test_devirtual(A *a) {
// CHECK: %0 = load ptr, ptr %vtable, align 8
// CHECK-NEXT: %1 = icmp eq ptr %0, @_ZN1B8virtual1Ev
// CHECK-NEXT: br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !12

// CHECK: if.true.direct_targ: ; preds = %entry
// CHECK-NEXT: %2 = tail call noundef i32 @_ZN1B8virtual1Ev(ptr noundef nonnull align 8 dereferenceable(8) %a)
// CHECK-NEXT: br label %if.end.icp

// CHECK: if.false.orig_indirect: ; preds = %entry
// CHECK-NEXT: %call = tail call noundef i32 %0(ptr noundef nonnull align 8 dereferenceable(8) %a)
// CHECK-NEXT: br label %if.end.icp

// CHECK: if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ
// CHECK-NEXT: %3 = phi i32 [ %call, %if.false.orig_indirect ], [ %2, %if.true.direct_targ ]
// CHECK-NEXT: ret i32 %3

return a->virtual1();
}

// Test that we skip devirtualization for empty virtual functions as most probably
// they are used for interfaces.
__attribute__((noinline))
void test_devirtual_empty_fn(A *a) {
// CHECK: load ptr, ptr %vfn, align 8
// CHECK-NEXT: tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %a)
a->empty_virtual();
}

void test() {
A *a = new B();
test_devirtual(a);
test_devirtual_empty_fn(a);
}
10 changes: 3 additions & 7 deletions clang/test/Driver/whole-program-vtables.c
Original file line number Diff line number Diff line change
@@ -1,15 +1,11 @@
// RUN: not %clang -target x86_64-unknown-linux -fwhole-program-vtables -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -### -- %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
// NO-LTO: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -O1 -### %s 2>&1 | FileCheck --check-prefix=WPD-NO-LTO %s
// RUN: %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -O1 -### -- %s 2>&1 | FileCheck --check-prefix=WPD-NO-LTO %s
// WPD-NO-LTO: "-fwhole-program-vtables"

// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -flto -### %s 2>&1 | FileCheck --check-prefix=LTO %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -flto -### -- %s 2>&1 | FileCheck --check-prefix=LTO %s
// LTO: "-fwhole-program-vtables"

/// -funified-lto does not imply -flto, so we still get an error that fwhole-program-vtables has no effect without -flto
// RUN: not %clang --target=x86_64-pc-linux-gnu -fwhole-program-vtables -funified-lto -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s
// RUN: not %clang --target=x86_64-pc-linux-gnu -fwhole-program-vtables -fno-unified-lto -### %s 2>&1 | FileCheck --check-prefix=NO-LTO %s

// RUN: %clang -target x86_64-unknown-linux -fwhole-program-vtables -fno-whole-program-vtables -flto -### %s 2>&1 | FileCheck --check-prefix=LTO-DISABLE %s
// RUN: not %clang_cl --target=x86_64-pc-win32 -fwhole-program-vtables -fno-whole-program-vtables -flto -### -- %s 2>&1 | FileCheck --check-prefix=LTO-DISABLE %s
// LTO-DISABLE-NOT: "-fwhole-program-vtables"
6 changes: 6 additions & 0 deletions llvm/include/llvm/Passes/PassBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,12 @@ class PipelineTuningOptions {
// analyses after various module->function or cgscc->function adaptors in the
// default pipelines.
bool EagerlyInvalidateAnalyses;

/// Tuning option to enable/disable whole program devirtualization.
/// Its default value is false.
/// This is controlled by the `-whole-program-vtables` flag.
/// Used only in non-LTO mode.
bool WholeProgramDevirt;
};

/// This class provides access to building LLVM's passes.
Expand Down
10 changes: 7 additions & 3 deletions llvm/include/llvm/Transforms/IPO/WholeProgramDevirt.h
Original file line number Diff line number Diff line change
Expand Up @@ -226,11 +226,15 @@ struct WholeProgramDevirtPass : public PassInfoMixin<WholeProgramDevirtPass> {
ModuleSummaryIndex *ExportSummary;
const ModuleSummaryIndex *ImportSummary;
bool UseCommandLine = false;
const bool InLTOMode;
WholeProgramDevirtPass()
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true) {}
: ExportSummary(nullptr), ImportSummary(nullptr), UseCommandLine(true),
InLTOMode(true) {}
WholeProgramDevirtPass(ModuleSummaryIndex *ExportSummary,
const ModuleSummaryIndex *ImportSummary)
: ExportSummary(ExportSummary), ImportSummary(ImportSummary) {
const ModuleSummaryIndex *ImportSummary,
bool InLTOMode = true)
: ExportSummary(ExportSummary), ImportSummary(ImportSummary),
InLTOMode(InLTOMode) {
assert(!(ExportSummary && ImportSummary));
}
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &);
Expand Down
18 changes: 18 additions & 0 deletions llvm/lib/Passes/PassBuilderPipelines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
MergeFunctions = EnableMergeFunctions;
InlinerThreshold = -1;
EagerlyInvalidateAnalyses = EnableEagerlyInvalidateAnalyses;
WholeProgramDevirt = false;
}

namespace llvm {
Expand Down Expand Up @@ -1629,6 +1630,23 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (!LTOPreLink)
MPM.addPass(RelLookupTableConverterPass());

if (PTO.WholeProgramDevirt && LTOPhase == ThinOrFullLTOPhase::None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be guarded by whether whole program vtables was enabled? We don't for LTO. What happens if it is simply always invoked for non-LTO?

MPM.addPass(WholeProgramDevirtPass(/*ExportSummary*/ nullptr,
/*ImportSummary*/ nullptr,
/*InLTOMode=*/false));
MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
lowertypetests::DropTestKind::Assume));
if (EnableModuleInliner) {
MPM.addPass(ModuleInlinerPass(getInlineParamsFromOptLevel(Level),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to reinvoke the inliner. Better to invoke WPD from the module simplifier right before the inliner is invoked there.

UseInlineAdvisor,
ThinOrFullLTOPhase::None));
} else {
MPM.addPass(ModuleInlinerWrapperPass(
getInlineParamsFromOptLevel(Level),
/* MandatoryFirst */ true,
InlineContext{ThinOrFullLTOPhase::None, InlinePass::CGSCCInliner}));
}
}
return MPM;
}

Expand Down
74 changes: 59 additions & 15 deletions llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
// returns 0, or a single vtable's function returns 1, replace each virtual
// call with a comparison of the vptr against that vtable's address.
//
// This pass is intended to be used during the regular and thin LTO pipelines:
// This pass is intended to be used during the regular/thinLTO and non-LTO
// pipelines:
//
// During regular LTO, the pass determines the best optimization for each
// virtual call and applies the resolutions directly to virtual calls that are
Expand All @@ -48,6 +49,13 @@
// is supported.
// - Import phase: (same as with hybrid case above).
//
// In non-LTO mode:
// - The pass apply speculative devirtualization without requiring any type of
// visibility.
// - Skips other features like virtual constant propagation, uniform return
// value
// optimization, unique return value optimization, branch funnels to minimize
// the drawbacks of wrong speculation.
//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/WholeProgramDevirt.h"
Expand All @@ -60,7 +68,9 @@
#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/ModuleSummaryAnalysis.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TypeMetadataUtils.h"
#include "llvm/Bitcode/BitcodeReader.h"
#include "llvm/Bitcode/BitcodeWriter.h"
Expand Down Expand Up @@ -798,6 +808,21 @@ PreservedAnalyses WholeProgramDevirtPass::run(Module &M,
return PreservedAnalyses::all();
return PreservedAnalyses::none();
}
std::optional<ModuleSummaryIndex> Index;
// Force Fallback mode as it's safe in case it's non-LTO mode where
// we don't have hidden visibility.
if (!InLTOMode) {
DevirtCheckMode = WPDCheckMode::Fallback;
// In non-LTO mode, we don't have an ExportSummary, so we
// build the ExportSummary from the module.
assert(!ExportSummary &&
"ExportSummary is expected to be empty in non-LTO mode");
if (DevirtCheckMode == WPDCheckMode::Fallback && !ExportSummary) {
ProfileSummaryInfo PSI(M);
Index.emplace(buildModuleSummaryIndex(M, nullptr, &PSI));
ExportSummary = Index.has_value() ? &Index.value() : nullptr;
}
}
if (!DevirtModule(M, AARGetter, OREGetter, LookupDomTree, ExportSummary,
ImportSummary)
.run())
Expand Down Expand Up @@ -1091,10 +1116,12 @@ bool DevirtModule::tryFindVirtualCallTargets(
if (!TM.Bits->GV->isConstant())
return false;

// We cannot perform whole program devirtualization analysis on a vtable
// with public LTO visibility.
if (TM.Bits->GV->getVCallVisibility() ==
GlobalObject::VCallVisibilityPublic)
// If speculative devirtualization is NOT enabled, it's not safe to perform
// whole program devirtualization
// analysis on a vtable with public LTO visibility.
if (DevirtCheckMode != WPDCheckMode::Fallback &&
TM.Bits->GV->getVCallVisibility() ==
GlobalObject::VCallVisibilityPublic)
return false;

Function *Fn = nullptr;
Expand All @@ -1112,6 +1139,11 @@ bool DevirtModule::tryFindVirtualCallTargets(
// calls to pure virtuals are UB.
if (Fn->getName() == "__cxa_pure_virtual")
continue;
// In Most cases empty functions will be overridden by the
// implementation of the derived class, so we can skip them.
if (DevirtCheckMode == WPDCheckMode::Fallback &&
Fn->getReturnType()->isVoidTy() && Fn->getInstructionCount() <= 1)
continue;

// We can disregard unreachable functions as possible call targets, as
// unreachable functions shouldn't be called.
Expand Down Expand Up @@ -1333,10 +1365,11 @@ bool DevirtModule::trySingleImplDevirt(
if (!IsExported)
return false;

// If the only implementation has local linkage, we must promote to external
// to make it visible to thin LTO objects. We can only get here during the
// ThinLTO export phase.
if (TheFn->hasLocalLinkage()) {
// In case of non-speculative devirtualization, If the only implementation has
// local linkage, we must promote to external
// to make it visible to thin LTO objects. We can only get here during the
// ThinLTO export phase.
if (DevirtCheckMode != WPDCheckMode::Fallback && TheFn->hasLocalLinkage()) {
std::string NewName = (TheFn->getName() + ".llvm.merged").str();

// Since we are renaming the function, any comdats with the same name must
Expand Down Expand Up @@ -2315,6 +2348,11 @@ bool DevirtModule::run() {

Function *TypeTestFunc =
Intrinsic::getDeclarationIfExists(&M, Intrinsic::type_test);
// If we are applying speculative devirtualization, we can work on the public
// type test intrinsics.
if (!TypeTestFunc && DevirtCheckMode == WPDCheckMode::Fallback)
TypeTestFunc =
Intrinsic::getDeclarationIfExists(&M, Intrinsic::public_type_test);
Function *TypeCheckedLoadFunc =
Intrinsic::getDeclarationIfExists(&M, Intrinsic::type_checked_load);
Function *TypeCheckedLoadRelativeFunc = Intrinsic::getDeclarationIfExists(
Expand Down Expand Up @@ -2437,12 +2475,18 @@ bool DevirtModule::run() {
.WPDRes[S.first.ByteOffset];
if (tryFindVirtualCallTargets(TargetsForSlot, TypeMemberInfos,
S.first.ByteOffset, ExportSummary)) {

if (!trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second, Res)) {
DidVirtualConstProp |=
tryVirtualConstProp(TargetsForSlot, S.second, Res, S.first);

tryICallBranchFunnel(TargetsForSlot, S.second, Res, S.first);
trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second, Res);
// In Speculative devirt mode, we skip virtual constant propagation
// and branch funneling to minimize the drawback if we got wrong
// speculation during devirtualization.
if (DevirtCheckMode != WPDCheckMode::Fallback) {
if (!trySingleImplDevirt(ExportSummary, TargetsForSlot, S.second,
Res)) {
DidVirtualConstProp |=
tryVirtualConstProp(TargetsForSlot, S.second, Res, S.first);

tryICallBranchFunnel(TargetsForSlot, S.second, Res, S.first);
}
}

// Collect functions devirtualized at least for one call site for stats.
Expand Down
Loading