Skip to content

[IR2Vec][NFC] Add helper methods for numeric ID mapping in Vocabulary #149212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2025

Conversation

svkeerthy
Copy link
Contributor

@svkeerthy svkeerthy commented Jul 16, 2025

Add helper methods to IR2Vec's Vocabulary class for numeric ID mapping and vocabulary size calculation. These APIs will be useful in triplet generation for llvm-ir2vec tool (See #149214).

(Tracking issue - #141817)

Copy link
Contributor Author

svkeerthy commented Jul 16, 2025

@svkeerthy svkeerthy changed the title exposing-new-methods [IR2Vec] Add helper methods for numeric ID mapping in Vocabulary Jul 16, 2025
@svkeerthy svkeerthy changed the title [IR2Vec] Add helper methods for numeric ID mapping in Vocabulary [IR2Vec][NFC] Add helper methods for numeric ID mapping in Vocabulary Jul 16, 2025
@svkeerthy svkeerthy marked this pull request as ready for review July 16, 2025 22:54
@llvmbot llvmbot added mlgo llvm:analysis Includes value tracking, cost tables and constant folding labels Jul 16, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-mlgo

Author: S. VenkataKeerthy (svkeerthy)

Changes

Add helper methods to IR2Vec's Vocabulary class for numeric ID mapping and vocabulary size calculation. These APIs will be useful in triplet generation for llvm-ir2vec tool (See #149214).

(Tracking issue - #141817)


Full diff: https://github.com/llvm/llvm-project/pull/149212.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Analysis/IR2Vec.h (+9)
  • (modified) llvm/lib/Analysis/IR2Vec.cpp (+18-2)
  • (modified) llvm/unittests/Analysis/IR2VecTest.cpp (+63)
diff --git a/llvm/include/llvm/Analysis/IR2Vec.h b/llvm/include/llvm/Analysis/IR2Vec.h
index 3d7edf08c8807..d87457cac7642 100644
--- a/llvm/include/llvm/Analysis/IR2Vec.h
+++ b/llvm/include/llvm/Analysis/IR2Vec.h
@@ -170,6 +170,10 @@ class Vocabulary {
   unsigned getDimension() const;
   size_t size() const;
 
+  static size_t expectedSize() {
+    return MaxOpcodes + MaxTypeIDs + MaxOperandKinds;
+  }
+
   /// Helper function to get vocabulary key for a given Opcode
   static StringRef getVocabKeyForOpcode(unsigned Opcode);
 
@@ -182,6 +186,11 @@ class Vocabulary {
   /// Helper function to classify an operand into OperandKind
   static OperandKind getOperandKind(const Value *Op);
 
+  /// Helpers to return the IDs of a given Opcode, TypeID, or OperandKind
+  static unsigned getNumericID(unsigned Opcode);
+  static unsigned getNumericID(Type::TypeID TypeID);
+  static unsigned getNumericID(const Value *Op);
+
   /// Accessors to get the embedding for a given entity.
   const ir2vec::Embedding &operator[](unsigned Opcode) const;
   const ir2vec::Embedding &operator[](Type::TypeID TypeId) const;
diff --git a/llvm/lib/Analysis/IR2Vec.cpp b/llvm/lib/Analysis/IR2Vec.cpp
index 898bf5b202feb..95f30fd3f4275 100644
--- a/llvm/lib/Analysis/IR2Vec.cpp
+++ b/llvm/lib/Analysis/IR2Vec.cpp
@@ -215,7 +215,7 @@ Vocabulary::Vocabulary(VocabVector &&Vocab)
     : Vocab(std::move(Vocab)), Valid(true) {}
 
 bool Vocabulary::isValid() const {
-  return Vocab.size() == (MaxOpcodes + MaxTypeIDs + MaxOperandKinds) && Valid;
+  return Vocab.size() == Vocabulary::expectedSize() && Valid;
 }
 
 size_t Vocabulary::size() const {
@@ -324,8 +324,24 @@ Vocabulary::OperandKind Vocabulary::getOperandKind(const Value *Op) {
   return OperandKind::VariableID;
 }
 
+unsigned Vocabulary::getNumericID(unsigned Opcode) {
+  assert(Opcode >= 1 && Opcode <= MaxOpcodes && "Invalid opcode");
+  return Opcode - 1; // Convert to zero-based index
+}
+
+unsigned Vocabulary::getNumericID(Type::TypeID TypeID) {
+  assert(static_cast<unsigned>(TypeID) < MaxTypeIDs && "Invalid type ID");
+  return MaxOpcodes + static_cast<unsigned>(TypeID);
+}
+
+unsigned Vocabulary::getNumericID(const Value *Op) {
+  unsigned Index = static_cast<unsigned>(getOperandKind(Op));
+  assert(Index < MaxOperandKinds && "Invalid OperandKind");
+  return MaxOpcodes + MaxTypeIDs + Index;
+}
+
 StringRef Vocabulary::getStringKey(unsigned Pos) {
-  assert(Pos < MaxOpcodes + MaxTypeIDs + MaxOperandKinds &&
+  assert(Pos < Vocabulary::expectedSize() &&
          "Position out of bounds in vocabulary");
   // Opcode
   if (Pos < MaxOpcodes)
diff --git a/llvm/unittests/Analysis/IR2VecTest.cpp b/llvm/unittests/Analysis/IR2VecTest.cpp
index cb6d633306a81..7c9a5464bfe1d 100644
--- a/llvm/unittests/Analysis/IR2VecTest.cpp
+++ b/llvm/unittests/Analysis/IR2VecTest.cpp
@@ -396,6 +396,69 @@ TEST(IR2VecVocabularyTest, DummyVocabTest) {
   }
 }
 
+TEST(IR2VecVocabularyTest, NumericIDMap) {
+  // Test getNumericID for opcodes
+  EXPECT_EQ(Vocabulary::getNumericID(1u), 0u);
+  EXPECT_EQ(Vocabulary::getNumericID(13u), 12u);
+  EXPECT_EQ(Vocabulary::getNumericID(MaxOpcodes), MaxOpcodes - 1);
+
+  // Test getNumericID for Type IDs
+  EXPECT_EQ(Vocabulary::getNumericID(Type::VoidTyID),
+            MaxOpcodes + static_cast<unsigned>(Type::VoidTyID));
+  EXPECT_EQ(Vocabulary::getNumericID(Type::HalfTyID),
+            MaxOpcodes + static_cast<unsigned>(Type::HalfTyID));
+  EXPECT_EQ(Vocabulary::getNumericID(Type::FloatTyID),
+            MaxOpcodes + static_cast<unsigned>(Type::FloatTyID));
+  EXPECT_EQ(Vocabulary::getNumericID(Type::IntegerTyID),
+            MaxOpcodes + static_cast<unsigned>(Type::IntegerTyID));
+  EXPECT_EQ(Vocabulary::getNumericID(Type::PointerTyID),
+            MaxOpcodes + static_cast<unsigned>(Type::PointerTyID));
+
+  // Test getNumericID for Value operands
+  LLVMContext Ctx;
+  Module M("TestM", Ctx);
+  FunctionType *FTy =
+      FunctionType::get(Type::getVoidTy(Ctx), {Type::getInt32Ty(Ctx)}, false);
+  Function *F = Function::Create(FTy, Function::ExternalLinkage, "testFunc", M);
+
+  // Test Function operand
+  EXPECT_EQ(Vocabulary::getNumericID(F),
+            MaxOpcodes + MaxTypeIDs + 0u); // Function = 0
+
+  // Test Constant operand
+  Constant *C = ConstantInt::get(Type::getInt32Ty(Ctx), 42);
+  EXPECT_EQ(Vocabulary::getNumericID(C),
+            MaxOpcodes + MaxTypeIDs + 2u); // Constant = 2
+
+  // Test Pointer operand
+  BasicBlock *BB = BasicBlock::Create(Ctx, "entry", F);
+  AllocaInst *PtrVal = new AllocaInst(Type::getInt32Ty(Ctx), 0, "ptr", BB);
+  EXPECT_EQ(Vocabulary::getNumericID(PtrVal),
+            MaxOpcodes + MaxTypeIDs + 1u); // Pointer = 1
+
+  // Test Variable operand (function argument)
+  Argument *Arg = F->getArg(0);
+  EXPECT_EQ(Vocabulary::getNumericID(Arg),
+            MaxOpcodes + MaxTypeIDs + 3u); // Variable = 3
+}
+
+#if GTEST_HAS_DEATH_TEST
+#ifndef NDEBUG
+TEST(IR2VecVocabularyTest, NumericIDMapInvalidInputs) {
+  // Test invalid opcode IDs
+  EXPECT_DEATH(Vocabulary::getNumericID(0u), "Invalid opcode");
+  EXPECT_DEATH(Vocabulary::getNumericID(MaxOpcodes + 1), "Invalid opcode");
+
+  // Test invalid type IDs
+  EXPECT_DEATH(Vocabulary::getNumericID(static_cast<Type::TypeID>(MaxTypeIDs)),
+               "Invalid type ID");
+  EXPECT_DEATH(
+      Vocabulary::getNumericID(static_cast<Type::TypeID>(MaxTypeIDs + 10)),
+      "Invalid type ID");
+}
+#endif // NDEBUG
+#endif // GTEST_HAS_DEATH_TEST
+
 TEST(IR2VecVocabularyTest, StringKeyGeneration) {
   EXPECT_EQ(Vocabulary::getStringKey(0), "Ret");
   EXPECT_EQ(Vocabulary::getStringKey(12), "Add");

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-14-_ir2vec_adding_documentation_for_llvm-ir2vec_tool branch from 6ae5021 to 3ad45e3 Compare July 16, 2025 23:32
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-16-exposing-new-methods branch 2 times, most recently from bc03736 to 68ae9f5 Compare July 16, 2025 23:46
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-14-_ir2vec_adding_documentation_for_llvm-ir2vec_tool branch 2 times, most recently from 42671b8 to a395af5 Compare July 17, 2025 18:04
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-16-exposing-new-methods branch from 68ae9f5 to 1d7ca80 Compare July 17, 2025 18:04
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-14-_ir2vec_adding_documentation_for_llvm-ir2vec_tool branch from a395af5 to 586947a Compare July 17, 2025 19:07
Base automatically changed from users/svkeerthy/07-14-_ir2vec_adding_documentation_for_llvm-ir2vec_tool to main July 17, 2025 19:09
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-16-exposing-new-methods branch 2 times, most recently from 01c6091 to f24c6f1 Compare July 17, 2025 19:55
Copy link
Contributor Author

svkeerthy commented Jul 17, 2025

Merge activity

  • Jul 17, 8:36 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Jul 17, 8:38 PM UTC: Graphite rebased this pull request as part of a merge.
  • Jul 17, 8:40 PM UTC: @svkeerthy merged this pull request with Graphite.

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-16-exposing-new-methods branch from f24c6f1 to faf9baa Compare July 17, 2025 20:37
@svkeerthy svkeerthy merged commit 61a45d2 into main Jul 17, 2025
7 of 9 checks passed
@svkeerthy svkeerthy deleted the users/svkeerthy/07-16-exposing-new-methods branch July 17, 2025 20:40
@boomanaiden154
Copy link
Contributor

@svkeerthy This didn't get reviewed at all?

Copy link
Contributor Author

Right. Pushed it as it was a minor refactoring. Feel free to add any comments. Will fix it.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 17, 2025

LLVM Buildbot has detected a new failure on builder lldb-x86_64-debian running on lldb-x86_64-debian while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/162/builds/27073

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
UNSUPPORTED: lldb-shell :: Register/riscv64-gp-read.test (3086 of 3097)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/app_specific_backtrace_crashlog.test (3087 of 3097)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/interactive_crashlog_arm64_register.test (3088 of 3097)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Lua/independent_state.test (3089 of 3097)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Python/Crashlog/altered_threadState.test (3090 of 3097)
UNSUPPORTED: lldb-shell :: ScriptInterpreter/Lua/bindings.test (3091 of 3097)
UNSUPPORTED: lldb-shell :: Register/aarch64-gp-read.test (3092 of 3097)
PASS: lldb-api :: api/multithreaded/TestMultithreaded.py (3093 of 3097)
PASS: lldb-api :: terminal/TestEditlineCompletions.py (3094 of 3097)
UNRESOLVED: lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py (3095 of 3097)
******************** TEST 'lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py' FAILED ********************
Script:
--
/usr/bin/python3 /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./lib --env LLVM_INCLUDE_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/include --env LLVM_TOOLS_DIR=/home/worker/2.0.1/lldb-x86_64-debian/build/./bin --arch x86_64 --build-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex --lldb-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/worker/2.0.1/lldb-x86_64-debian/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/lldb --compiler /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/clang --dsymutil /home/worker/2.0.1/lldb-x86_64-debian/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./bin --lldb-obj-root /home/worker/2.0.1/lldb-x86_64-debian/build/tools/lldb --lldb-libs-dir /home/worker/2.0.1/lldb-x86_64-debian/build/./lib --cmake-build-type Release -t /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/tools/lldb-dap/launch -p TestDAP_launch.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision 61a45d20cfe7f93ec1b73dc2dd776f493af2a7cc)
  clang revision 61a45d20cfe7f93ec1b73dc2dd776f493af2a7cc
  llvm revision 61a45d20cfe7f93ec1b73dc2dd776f493af2a7cc
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
Change dir to: /home/worker/2.0.1/lldb-x86_64-debian/llvm-project/lldb/test/API/tools/lldb-dap/launch
runCmd: settings clear --all

output: 

runCmd: settings set symbols.enable-external-lookup false

output: 

runCmd: settings set target.inherit-tcc true

output: 

runCmd: settings set target.disable-aslr false

output: 

runCmd: settings set target.detach-on-error false

output: 

runCmd: settings set target.auto-apply-fixits false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding mlgo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants