Skip to content

[llvm-readobj][COFF] Implement --coff-pseudoreloc in llvm-readobj to dump runtime pseudo-relocation records #151816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

kikairoya
Copy link
Contributor

MinGW toolchain uses "runtime pseudo-relocation" mechanism to support auto-importing symbols from DLLs.
There is no commonly used tools for dump the pseudo-relocation records, so we implement that functionality in llvm-readobj.

@llvmbot
Copy link
Member

llvmbot commented Aug 2, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: Tomohiro Kashiwada (kikairoya)

Changes

MinGW toolchain uses "runtime pseudo-relocation" mechanism to support auto-importing symbols from DLLs.
There is no commonly used tools for dump the pseudo-relocation records, so we implement that functionality in llvm-readobj.


Full diff: https://github.com/llvm/llvm-project/pull/151816.diff

6 Files Affected:

  • (added) llvm/test/tools/llvm-readobj/COFF/Inputs/pseudoreloc.exe ()
  • (added) llvm/test/tools/llvm-readobj/COFF/pseudoreloc.test (+97)
  • (modified) llvm/tools/llvm-readobj/COFFDumper.cpp (+109)
  • (modified) llvm/tools/llvm-readobj/ObjDumper.h (+1)
  • (modified) llvm/tools/llvm-readobj/Opts.td (+3)
  • (modified) llvm/tools/llvm-readobj/llvm-readobj.cpp (+4)
diff --git a/llvm/test/tools/llvm-readobj/COFF/Inputs/pseudoreloc.exe b/llvm/test/tools/llvm-readobj/COFF/Inputs/pseudoreloc.exe
new file mode 100644
index 0000000000000..d4106e99d96f3
Binary files /dev/null and b/llvm/test/tools/llvm-readobj/COFF/Inputs/pseudoreloc.exe differ
diff --git a/llvm/test/tools/llvm-readobj/COFF/pseudoreloc.test b/llvm/test/tools/llvm-readobj/COFF/pseudoreloc.test
new file mode 100644
index 0000000000000..f3db464b4ae69
--- /dev/null
+++ b/llvm/test/tools/llvm-readobj/COFF/pseudoreloc.test
@@ -0,0 +1,97 @@
+RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/pseudoreloc.exe | FileCheck %s
+RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/nop.exe.coff-x86-64 | FileCheck %s --check-prefix=NOSYM
+RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/trivial.obj.coff-i386 | FileCheck %s --check-prefix=NORELOC
+
+CHECK:      Format: COFF-i386
+CHECK-NEXT: Arch: i386
+CHECK-NEXT: AddressSize: 32bit
+CHECK-NEXT: PseudoReloc [
+CHECK-NEXT:  Entry {
+CHECK-NEXT:   Symbol: 0x{{[0-9A-Z]+}}
+CHECK-NEXT:   SymbolName: sym1
+CHECK-NEXT:   Target: 0x{{[0-9A-Z]+}}
+CHECK-NEXT:   BitWidth: {{[0-9]+}}
+CHECK-NEXT:  }
+CHECK-NEXT:  Entry {
+CHECK-NEXT:   Symbol: 0x{{[0-9A-Z]+}}
+CHECK-NEXT:   SymbolName: sym2
+CHECK-NEXT:   Target: 0x{{[0-9A-Z]+}}
+CHECK-NEXT:   BitWidth: {{[0-9]+}}
+CHECK-NEXT:  }
+CHECK-NEXT:  Entry {
+CHECK-NEXT:   Symbol: 0x{{[0-9A-Z]+}}
+CHECK-NEXT:   SymbolName: sym1
+CHECK-NEXT:   Target: 0x{{[0-9A-Z]+}}
+CHECK-NEXT:   BitWidth: {{[0-9]+}}
+CHECK-NEXT:  }
+CHECK-NEXT: ]
+
+NOSYM-NOT: PseudoReloc
+NOSYM: The symbol table has been stripped
+NOSYM-NOT: PseudoReloc
+
+NORELOC-NOT: PseudoReloc
+NORELOC: The symbols for runtime pseudo-relocation are not found
+NORELOC-NOT: PseudoReloc
+
+
+pseudoreloc.exe is generated by following script:
+
+#--- generate.sh
+llvm-mc -triple i386-mingw32 -filetype obj pseudoreloc.dll.s -o pseudoreloc.dll.o
+ld.lld -m i386pe --dll pseudoreloc.dll.o -o pseudoreloc.dll -entry=
+llvm-mc -triple i386-mingw32 -filetype obj pseudoreloc.s -o pseudoreloc.o
+ld.lld -m i386pe pseudoreloc.o pseudoreloc.dll -o pseudoreloc.exe -entry=start
+
+#--- pseudoreloc.dll.s
+    .data
+    .globl _sym1
+_sym1:
+    .long 0x11223344
+    .globl _sym2
+_sym2:
+    .long 0x55667788
+    .section .drectve
+    .ascii " -export:sym1,data "
+    .ascii " -export:sym2,data "
+    .addrsig
+
+#--- pseudoreloc.s
+    .text
+    .globl _start
+_start:
+    mov _local1b, %eax
+    movsb (%eax), %ecx
+    mov _local2, %eax
+    movsb (%eax), %edx
+    mov _local1a, %eax
+    movsb (%eax), %eax
+    add %edx, %eax
+    add %ecx, %eax
+    ret
+
+    .globl __pei386_runtime_relocator
+__pei386_runtime_relocator:
+    mov ___RUNTIME_PSEUDO_RELOC_LIST__, %eax
+    mov ___RUNTIME_PSEUDO_RELOC_LIST_END__, %ecx
+    sub %ecx, %eax
+    ret
+
+    .data
+    .globl  _local1a
+    .p2align 2
+_local1a:
+    .long _sym1+1
+
+    .globl _local2
+    .p2align 2
+_local2:
+    .long _sym2+1
+
+    .globl  _local1b
+    .p2align 2
+_local1b:
+    .long _sym1+3
+
+    .addrsig
+
diff --git a/llvm/tools/llvm-readobj/COFFDumper.cpp b/llvm/tools/llvm-readobj/COFFDumper.cpp
index 96e0a634648e4..45ca018b714f2 100644
--- a/llvm/tools/llvm-readobj/COFFDumper.cpp
+++ b/llvm/tools/llvm-readobj/COFFDumper.cpp
@@ -95,6 +95,7 @@ class COFFDumper : public ObjDumper {
   void printCOFFExports() override;
   void printCOFFDirectives() override;
   void printCOFFBaseReloc() override;
+  void printCOFFPseudoReloc() override;
   void printCOFFDebugDirectory() override;
   void printCOFFTLSDirectory() override;
   void printCOFFResources() override;
@@ -2000,6 +2001,114 @@ void COFFDumper::printCOFFBaseReloc() {
   }
 }
 
+void COFFDumper::printCOFFPseudoReloc() {
+  const StringRef RelocBeginName = Obj->getArch() == Triple::x86
+                                       ? "___RUNTIME_PSEUDO_RELOC_LIST__"
+                                       : "__RUNTIME_PSEUDO_RELOC_LIST__";
+  const StringRef RelocEndName = Obj->getArch() == Triple::x86
+                                     ? "___RUNTIME_PSEUDO_RELOC_LIST_END__"
+                                     : "__RUNTIME_PSEUDO_RELOC_LIST_END__";
+
+  COFFSymbolRef RelocBegin, RelocEnd;
+  auto Count = Obj->getNumberOfSymbols();
+  if (Count == 0) {
+    W.startLine() << "The symbol table has been stripped\n";
+    return;
+  }
+  for (auto i = 0u;
+       i < Count && (!RelocBegin.getRawPtr() || !RelocEnd.getRawPtr()); ++i) {
+    auto Sym = Obj->getSymbol(i);
+    if (Sym.takeError())
+      continue;
+    auto Name = Obj->getSymbolName(*Sym);
+    if (Name.takeError())
+      continue;
+    if (*Name == RelocBeginName) {
+      if (Sym->getSectionNumber() > 0)
+        RelocBegin = *Sym;
+    } else if (*Name == RelocEndName) {
+      if (Sym->getSectionNumber() > 0)
+        RelocEnd = *Sym;
+    }
+  }
+  if (!RelocBegin.getRawPtr() || !RelocEnd.getRawPtr()) {
+    W.startLine()
+        << "The symbols for runtime pseudo-relocation are not found\n";
+    return;
+  }
+
+  ArrayRef<uint8_t> Data;
+  auto Section = Obj->getSection(RelocBegin.getSectionNumber());
+  if (auto E = Section.takeError()) {
+    reportError(std::move(E), Obj->getFileName());
+    return;
+  }
+  if (auto E = Obj->getSectionContents(*Section, Data)) {
+    reportError(std::move(E), Obj->getFileName());
+    return;
+  }
+  ArrayRef<uint8_t> RawRelocs =
+      Data.take_front(RelocEnd.getValue()).drop_front(RelocBegin.getValue());
+  struct alignas(4) PseudoRelocationHeader {
+    uint32_t Zero1;
+    uint32_t Zero2;
+    uint32_t Signature;
+  };
+  static const PseudoRelocationHeader HeaderV2 = {0, 0, 1};
+  if (RawRelocs.size() < sizeof(HeaderV2) ||
+      (memcmp(RawRelocs.data(), &HeaderV2, sizeof(HeaderV2)) != 0)) {
+    reportWarning(
+        createStringError("Invalid runtime pseudo-relocation records"),
+        Obj->getFileName());
+    return;
+  }
+  struct alignas(4) PseudoRelocationRecord {
+    uint32_t Symbol;
+    uint32_t Target;
+    uint32_t BitSize;
+  };
+  ArrayRef<PseudoRelocationRecord> RelocRecords(
+      reinterpret_cast<const PseudoRelocationRecord *>(
+          RawRelocs.data() + sizeof(PseudoRelocationHeader)),
+      (RawRelocs.size() - sizeof(PseudoRelocationHeader)) /
+          sizeof(PseudoRelocationRecord));
+
+  // Cache of symbol searched at least once in IAT
+  DenseMap<uint32_t, StringRef> ImportedSymbols;
+
+  ListScope D(W, "PseudoReloc");
+  for (const auto &Reloc : RelocRecords) {
+    DictScope Entry(W, "Entry");
+    W.printHex("Symbol", Reloc.Symbol);
+
+    // find and print the pointed symbol from IAT
+    [&]() {
+      for (auto D : Obj->import_directories()) {
+        uint32_t RVA;
+        if (auto E = D.getImportAddressTableRVA(RVA))
+          reportError(std::move(E), Obj->getFileName());
+        if (Reloc.Symbol < RVA)
+          continue;
+        for (auto S : D.imported_symbols()) {
+          if (RVA == Reloc.Symbol) {
+            if (auto E = S.getSymbolName(ImportedSymbols[RVA]))
+              reportError(std::move(E), Obj->getFileName());
+            return;
+          }
+          RVA += Obj->is64() ? 8 : 4;
+        }
+      }
+    }();
+    if (auto Ite = ImportedSymbols.find(Reloc.Symbol);
+        Ite != ImportedSymbols.end()) {
+      W.printString("SymbolName", Ite->second);
+    }
+
+    W.printHex("Target", Reloc.Target);
+    W.printNumber("BitWidth", Reloc.BitSize);
+  }
+}
+
 void COFFDumper::printCOFFResources() {
   ListScope ResourcesD(W, "Resources");
   for (const SectionRef &S : Obj->sections()) {
diff --git a/llvm/tools/llvm-readobj/ObjDumper.h b/llvm/tools/llvm-readobj/ObjDumper.h
index 1dc29661f7178..a654078a770ff 100644
--- a/llvm/tools/llvm-readobj/ObjDumper.h
+++ b/llvm/tools/llvm-readobj/ObjDumper.h
@@ -146,6 +146,7 @@ class ObjDumper {
   virtual void printCOFFExports() { }
   virtual void printCOFFDirectives() { }
   virtual void printCOFFBaseReloc() { }
+  virtual void printCOFFPseudoReloc() {}
   virtual void printCOFFDebugDirectory() { }
   virtual void printCOFFTLSDirectory() {}
   virtual void printCOFFResources() {}
diff --git a/llvm/tools/llvm-readobj/Opts.td b/llvm/tools/llvm-readobj/Opts.td
index 48d43cc635a4f..d519e34a72983 100644
--- a/llvm/tools/llvm-readobj/Opts.td
+++ b/llvm/tools/llvm-readobj/Opts.td
@@ -82,6 +82,9 @@ def codeview_ghash : FF<"codeview-ghash", "Enable global hashing for CodeView ty
 def codeview_merged_types : FF<"codeview-merged-types", "Display the merged CodeView type stream">, Group<grp_coff>;
 def codeview_subsection_bytes : FF<"codeview-subsection-bytes", "Dump raw contents of codeview debug sections and records">, Group<grp_coff>;
 def coff_basereloc : FF<"coff-basereloc", "Display .reloc section">, Group<grp_coff>;
+def coff_pseudoreloc
+    : FF<"coff-pseudoreloc", "Display runtime pseudo-relocations">,
+      Group<grp_coff>;
 def coff_debug_directory : FF<"coff-debug-directory", "Display debug directory">, Group<grp_coff>;
 def coff_directives : FF<"coff-directives", "Display .drectve section">, Group<grp_coff>;
 def coff_exports : FF<"coff-exports", "Display export table">, Group<grp_coff>;
diff --git a/llvm/tools/llvm-readobj/llvm-readobj.cpp b/llvm/tools/llvm-readobj/llvm-readobj.cpp
index 4c84ed701bb9a..2b34761b2cc6c 100644
--- a/llvm/tools/llvm-readobj/llvm-readobj.cpp
+++ b/llvm/tools/llvm-readobj/llvm-readobj.cpp
@@ -154,6 +154,7 @@ static bool CodeViewEnableGHash;
 static bool CodeViewMergedTypes;
 bool CodeViewSubsectionBytes;
 static bool COFFBaseRelocs;
+static bool COFFPseudoRelocs;
 static bool COFFDebugDirectory;
 static bool COFFDirectives;
 static bool COFFExports;
@@ -305,6 +306,7 @@ static void parseOptions(const opt::InputArgList &Args) {
   opts::CodeViewMergedTypes = Args.hasArg(OPT_codeview_merged_types);
   opts::CodeViewSubsectionBytes = Args.hasArg(OPT_codeview_subsection_bytes);
   opts::COFFBaseRelocs = Args.hasArg(OPT_coff_basereloc);
+  opts::COFFPseudoRelocs = Args.hasArg(OPT_coff_pseudoreloc);
   opts::COFFDebugDirectory = Args.hasArg(OPT_coff_debug_directory);
   opts::COFFDirectives = Args.hasArg(OPT_coff_directives);
   opts::COFFExports = Args.hasArg(OPT_coff_exports);
@@ -492,6 +494,8 @@ static void dumpObject(ObjectFile &Obj, ScopedPrinter &Writer,
       Dumper->printCOFFDirectives();
     if (opts::COFFBaseRelocs)
       Dumper->printCOFFBaseReloc();
+    if (opts::COFFPseudoRelocs)
+      Dumper->printCOFFPseudoReloc();
     if (opts::COFFDebugDirectory)
       Dumper->printCOFFDebugDirectory();
     if (opts::COFFTLSDirectory)

@kikairoya
Copy link
Contributor Author

While investigating #149639, we used the script previously created by @mstorsjo and @jeremyd2019 for dumping pseudo‑relocs.
So the tool has already been used at least twice — I bet a third time will come too.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, this is very much appreciated! The script has indeed been useful a number of times. Some of the times I needed it, I also considered writing something like this, but in that case, most of the cases where I needed it, I wanted to find pseudo relocations that were too narrow (32 bit pseudo relocations in 64 bit binaries), and wanted the more precise diagnostics about where it was caused (which I ask about in a review comment here as well). But in the end, at the time, I ended up implementing that warning in LLD instead, in 6daa4b9.

@@ -0,0 +1,97 @@
RUN: llvm-readobj --coff-pseudoreloc %p/Inputs/pseudoreloc.exe | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, it'd be nicer to synthesize the test binary from yaml with yaml2obj rather than checking in a binary. (We do have some binaries checked in from before, but we'd like to keep that to a minimum, e.g. for binaries that can't be synthesized with yaml2obj yet.)

I see that you have the full procedure included for regenerating the binary, that's nice and appreciated! If converting it to yaml, it's also somewhat customary to strip down the size of it by removing unnecessary data from it. Perhaps it's not necessary in this case if the payload of each section is only a couple dozens of bytes anyway though. But if it is, the instructions would unfortunately end with obj2yaml pseudoreloc.exe > pseudoreloc.exe.yaml # and manually strip down the .yaml file. But if there's not that much unnecessary in there, perhaps we don't need to strip it manually at all.

CHECK-NEXT: }
CHECK-NEXT: ]

NOSYM-NOT: PseudoReloc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of needing to repeat the -NOT condition before/after the positive check that we do look for, FileCheck also has got an option --implicit-check-not which you can consider using.

CHECK-NEXT: Symbol: 0x{{[0-9A-Z]+}}
CHECK-NEXT: SymbolName: sym1
CHECK-NEXT: Target: 0x{{[0-9A-Z]+}}
CHECK-NEXT: BitWidth: {{[0-9]+}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While these regex patterns are nice for accepting anything that we'd expect to output, it also makes the test kinda weak - we could start outputting different wrong addresses (and bitwidth sizes), without the test catching it - that's not ideal. So I think it would be good to check the actual numbers as well, to make sure the test catches any potential breakage in it.

pseudoreloc.exe is generated by following script:

#--- generate.sh
llvm-mc -triple i386-mingw32 -filetype obj pseudoreloc.dll.s -o pseudoreloc.dll.o
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps include a split-file command here as well, for easier execution if one wants to try it?

@@ -82,6 +82,9 @@ def codeview_ghash : FF<"codeview-ghash", "Enable global hashing for CodeView ty
def codeview_merged_types : FF<"codeview-merged-types", "Display the merged CodeView type stream">, Group<grp_coff>;
def codeview_subsection_bytes : FF<"codeview-subsection-bytes", "Dump raw contents of codeview debug sections and records">, Group<grp_coff>;
def coff_basereloc : FF<"coff-basereloc", "Display .reloc section">, Group<grp_coff>;
def coff_pseudoreloc
: FF<"coff-pseudoreloc", "Display runtime pseudo-relocations">,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should include the word "mingw" in the option description as well, as this isn't relevant for general PE-COFF?

i < Count && (!RelocBegin.getRawPtr() || !RelocEnd.getRawPtr()); ++i) {
auto Sym = Obj->getSymbol(i);
if (Sym.takeError())
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC you can't just ignore the errors like this (the error classes has got a destructor that aborts if you haven't actually done anything with the error). See other similar functions here for ways of doing it; e.g. if (!Sym) { consumeError(Sym.takeError()); return; } if we just want to ignore the error, or if (!Sym) reportError(Sym.takeError(), Obj->getFileName()); if we want to print it.

These error classes are quite tricky to use in that sense, so ideally one would need to have tested triggering all of these error cases - and unfortunately it can probably be quite hard to actually force these to fail as well... Perhaps by hex editing a binary to make symbol string offsets out of bounds?

return;
}
ArrayRef<uint8_t> RawRelocs =
Data.take_front(RelocEnd.getValue()).drop_front(RelocBegin.getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before subtracting RelocEnd from RelocBegin, we'd need to validate that they indeed point into the same section.

const PseudoRelocationHeader HeaderV2(1);
if (RawRelocs.size() < sizeof(HeaderV2) ||
(memcmp(RawRelocs.data(), &HeaderV2, sizeof(HeaderV2)) != 0)) {
reportWarning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If RawRelocs.size() == 0, we probably should print a different message than this? (Not sure if the linker actually ever does produce that, and what the runtime would do with it.)

if (auto E = D.getImportAddressTableRVA(RVA))
reportError(std::move(E), Obj->getFileName());
if (EntryRVA < RVA)
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a similar check we could do, if (EntryRVA > RVA + ImportAddressTableSize) to avoid iterating over the table, if we easily could see that the address is past the end of this table?

ListScope D(W, "PseudoReloc");
for (const auto &Reloc : RelocRecords) {
DictScope Entry(W, "Entry");
W.printHex("Symbol", Reloc.Symbol);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice to have feature here, which probably is out of scope for the initial version at least, would be to figure out which data block it belongs to (e.g. which function, or which variable contains the pseudo relocation). But as COFF symbols only have offset but not size, we can't probably do that easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants