Skip to content

[MLIR] emitc: Add fmtArgs to verbatim #123294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 18, 2025

Conversation

mgehre-amd
Copy link
Contributor

Allows to print code snippets that refer to arguments or local variables. E.g. emitc.verbatim "#pragma abc var={}" args %arg0 : !emitc.ptr<i32> is printed as #pragma abc var=v1 when the translator had decided to print %arg0 as v1.

As a follow-up PR, we will use the same infra to extend opaque type, which provides a way to generate template types depending on the spelling of other types.

@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-mlir-emitc

@llvm/pr-subscribers-mlir

Author: Matthias Gehre (mgehre-amd)

Changes

Allows to print code snippets that refer to arguments or local variables. E.g. emitc.verbatim "#pragma abc var={}" args %arg0 : !emitc.ptr&lt;i32&gt; is printed as #pragma abc var=v1 when the translator had decided to print %arg0 as v1.

As a follow-up PR, we will use the same infra to extend opaque type, which provides a way to generate template types depending on the spelling of other types.


Full diff: https://github.com/llvm/llvm-project/pull/123294.diff

7 Files Affected:

  • (modified) mlir/include/mlir/Dialect/EmitC/IR/EmitC.h (+6)
  • (modified) mlir/include/mlir/Dialect/EmitC/IR/EmitC.td (+21-2)
  • (modified) mlir/lib/Dialect/EmitC/IR/EmitC.cpp (+108)
  • (modified) mlir/lib/Target/Cpp/TranslateToCpp.cpp (+15-1)
  • (modified) mlir/test/Dialect/EmitC/invalid_ops.mlir (+48)
  • (modified) mlir/test/Dialect/EmitC/ops.mlir (+15)
  • (modified) mlir/test/Target/Cpp/verbatim.mlir (+26-2)
diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h
index 87a4078f280f65..57029c64ffd007 100644
--- a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h
+++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.h
@@ -27,6 +27,8 @@
 #include "mlir/Dialect/EmitC/IR/EmitCDialect.h.inc"
 #include "mlir/Dialect/EmitC/IR/EmitCEnums.h.inc"
 
+#include <variant>
+
 namespace mlir {
 namespace emitc {
 void buildTerminatedBody(OpBuilder &builder, Location loc);
@@ -47,6 +49,10 @@ bool isSupportedFloatType(mlir::Type type);
 /// Determines whether \p type is a emitc.size_t/ssize_t type.
 bool isPointerWideType(mlir::Type type);
 
+// Either a literal string, or an placeholder for the fmtArgs.
+struct Placeholder {};
+using ReplacementItem = std::variant<StringRef, Placeholder>;
+
 } // namespace emitc
 } // namespace mlir
 
diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
index b16f5a8619fe7b..8d1361edf33349 100644
--- a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
+++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
@@ -1198,10 +1198,29 @@ def EmitC_VerbatimOp : EmitC_Op<"verbatim"> {
     }
     #endif
     ```
+
+    If the `emitc.verbatim` op has operands, then the `value` is interpreted as
+    format string, where `{}` is a placeholder for an operand in their order.
+    For example, `emitc.verbatim "#pragma my src={} dst={}" %src, %dest : i32, i32`
+    would be emitted as `#pragma my src=a dst=b` if `%src` became `a` and
+    `%dest` became `b` in the C code.
+    `{{` in the format string is interpreted as a single `{` and doesn't introduce
+    a placeholder.
   }];
 
-  let arguments = (ins StrAttr:$value);
-  let assemblyFormat = "$value attr-dict";
+  let extraClassDeclaration = [{
+    FailureOr<SmallVector<::mlir::emitc::ReplacementItem>> parseFormatString();
+  }];
+
+  let arguments = (ins StrAttr:$value, Variadic<EmitCType>:$fmtArgs);
+
+  let builders = [OpBuilder<(ins "::mlir::StringAttr":$value),
+                            [{ build($_builder, $_state, value, {}); }]>];
+  let builders = [OpBuilder<(ins "::llvm::StringRef":$value),
+                            [{ build($_builder, $_state, value, {}); }]>];
+  let hasVerifier = 1;
+  let assemblyFormat =
+      "$value (`args` $fmtArgs^ `:` type($fmtArgs))? attr-dict";
 }
 
 def EmitC_AssignOp : EmitC_Op<"assign", []> {
diff --git a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
index fdc21d6c6e24b9..095712a9c89094 100644
--- a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
+++ b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
@@ -19,6 +19,7 @@
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/TypeSwitch.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/FormatVariadic.h"
 
 using namespace mlir;
 using namespace mlir::emitc;
@@ -167,6 +168,64 @@ static LogicalResult verifyInitializationAttribute(Operation *op,
   return success();
 }
 
+/// Parse a format string and return a list of its parts.
+/// A part is either a StringRef that has to be printed as-is, or
+/// a Placeholder which requires printing the next operand of the VerbatimOp.
+/// In the format string, all `{}` are replaced by Placeholders, except if the
+/// `{` is escaped by `{{` - then it doesn't start a placeholder.
+template <class ArgType>
+FailureOr<SmallVector<ReplacementItem>>
+parseFormatString(StringRef toParse, ArgType fmtArgs,
+                  std::optional<llvm::function_ref<mlir::InFlightDiagnostic()>>
+                      emitError = {}) {
+  SmallVector<ReplacementItem> items;
+
+  // If there are not operands, the format string is not interpreted.
+  if (fmtArgs.empty()) {
+    items.push_back(toParse);
+    return items;
+  }
+
+  while (!toParse.empty()) {
+    size_t idx = toParse.find('{');
+    if (idx == StringRef::npos) {
+      // No '{'
+      items.push_back(toParse);
+      break;
+    }
+    if (idx > 0) {
+      // Take all chars excluding the '{'.
+      items.push_back(toParse.take_front(idx));
+      toParse = toParse.drop_front(idx);
+      continue;
+    }
+    if (toParse.size() < 2) {
+      // '{' is last character
+      items.push_back(toParse);
+      break;
+    }
+    // toParse contains at least two characters and starts with `{`.
+    char nextChar = toParse[1];
+    if (nextChar == '{') {
+      // Double '{{' -> '{' (escaping).
+      items.push_back(toParse.take_front(1));
+      toParse = toParse.drop_front(2);
+      continue;
+    }
+    if (nextChar == '}') {
+      items.push_back(Placeholder{});
+      toParse = toParse.drop_front(2);
+      continue;
+    }
+
+    if (emitError.has_value()) {
+      return (*emitError)() << "expected '}' after unescaped '{'";
+    }
+    return failure();
+  }
+  return items;
+}
+
 //===----------------------------------------------------------------------===//
 // AddOp
 //===----------------------------------------------------------------------===//
@@ -909,6 +968,55 @@ LogicalResult emitc::SubscriptOp::verify() {
   return success();
 }
 
+//===----------------------------------------------------------------------===//
+// VerbatimOp
+//===----------------------------------------------------------------------===//
+
+LogicalResult emitc::VerbatimOp::verify() {
+  auto errorCallback = [&]() -> InFlightDiagnostic {
+    return this->emitOpError();
+  };
+  FailureOr<SmallVector<ReplacementItem>> fmt =
+      ::parseFormatString(getValue(), getFmtArgs(), errorCallback);
+  if (failed(fmt))
+    return failure();
+
+  size_t numPlaceholders = llvm::count_if(*fmt, [](ReplacementItem &item) {
+    return std::holds_alternative<Placeholder>(item);
+  });
+
+  if (numPlaceholders != getFmtArgs().size()) {
+    return emitOpError()
+           << "requires operands for each placeholder in the format string";
+  }
+  return success();
+}
+
+static ParseResult parseVariadicTypeFmtArgs(AsmParser &p,
+                                            SmallVector<Type> &params) {
+  Type type;
+  if (p.parseType(type))
+    return failure();
+
+  params.push_back(type);
+  while (succeeded(p.parseOptionalComma())) {
+    if (p.parseType(type))
+      return failure();
+    params.push_back(type);
+  }
+
+  return success();
+}
+
+static void printVariadicTypeFmtArgs(AsmPrinter &p, ArrayRef<Type> params) {
+  llvm::interleaveComma(params, p, [&](Type type) { p.printType(type); });
+}
+
+FailureOr<SmallVector<ReplacementItem>> emitc::VerbatimOp::parseFormatString() {
+  // Error checking is done in verify.
+  return ::parseFormatString(getValue(), getFmtArgs());
+}
+
 //===----------------------------------------------------------------------===//
 // EmitC Enums
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Target/Cpp/TranslateToCpp.cpp b/mlir/lib/Target/Cpp/TranslateToCpp.cpp
index a91f5ab9311401..af5261ff392f72 100644
--- a/mlir/lib/Target/Cpp/TranslateToCpp.cpp
+++ b/mlir/lib/Target/Cpp/TranslateToCpp.cpp
@@ -556,7 +556,21 @@ static LogicalResult printOperation(CppEmitter &emitter,
                                     emitc::VerbatimOp verbatimOp) {
   raw_ostream &os = emitter.ostream();
 
-  os << verbatimOp.getValue();
+  FailureOr<SmallVector<ReplacementItem>> items =
+      verbatimOp.parseFormatString();
+  if (failed(items))
+    return failure();
+
+  auto fmtArg = verbatimOp.getFmtArgs().begin();
+
+  for (ReplacementItem &item : *items) {
+    if (auto *str = std::get_if<StringRef>(&item)) {
+      os << *str;
+    } else {
+      if (failed(emitter.emitOperand(*fmtArg++)))
+        return failure();
+    }
+  }
 
   return success();
 }
diff --git a/mlir/test/Dialect/EmitC/invalid_ops.mlir b/mlir/test/Dialect/EmitC/invalid_ops.mlir
index a0d8d7f59de115..46998b1d52deaf 100644
--- a/mlir/test/Dialect/EmitC/invalid_ops.mlir
+++ b/mlir/test/Dialect/EmitC/invalid_ops.mlir
@@ -566,3 +566,51 @@ func.func @emitc_switch() {
   }
   return
 }
+
+// -----
+
+func.func @test_verbatim(%arg0 : !emitc.ptr<i32>, %arg1 : i32) {
+  // expected-error @+1 {{'emitc.verbatim' op requires operands for each placeholder in the format string}}
+  emitc.verbatim "" args %arg0, %arg1 : !emitc.ptr<i32>, i32
+  return
+}
+
+// -----
+
+func.func @test_verbatim(%arg0 : !emitc.ptr<i32>, %arg1 : i32) {
+  // expected-error @+1 {{'emitc.verbatim' op requires operands for each placeholder in the format string}}
+  emitc.verbatim "abc" args %arg0, %arg1 : !emitc.ptr<i32>, i32
+  return
+}
+
+// -----
+
+func.func @test_verbatim(%arg0 : !emitc.ptr<i32>, %arg1 : i32) {
+  // expected-error @+1 {{'emitc.verbatim' op requires operands for each placeholder in the format string}}
+  emitc.verbatim "{}" args %arg0, %arg1 : !emitc.ptr<i32>, i32
+  return
+}
+
+// -----
+
+func.func @test_verbatim(%arg0 : !emitc.ptr<i32>, %arg1 : i32) {
+  // expected-error @+1 {{'emitc.verbatim' op requires operands for each placeholder in the format string}}
+  emitc.verbatim "{} {} {}" args %arg0, %arg1 : !emitc.ptr<i32>, i32
+  return
+}
+
+// -----
+
+func.func @test_verbatim(%arg0 : !emitc.ptr<i32>, %arg1 : i32) {
+  // expected-error @+1 {{'emitc.verbatim' op expected '}' after unescaped '{'}}
+  emitc.verbatim "{ " args %arg0, %arg1 : !emitc.ptr<i32>, i32
+  return
+}
+
+// -----
+
+func.func @test_verbatim(%arg0 : !emitc.ptr<i32>, %arg1 : i32) {
+  // expected-error @+1 {{'emitc.verbatim' op expected '}' after unescaped '{'}}
+  emitc.verbatim "{a} " args %arg0, %arg1 : !emitc.ptr<i32>, i32
+  return
+}
diff --git a/mlir/test/Dialect/EmitC/ops.mlir b/mlir/test/Dialect/EmitC/ops.mlir
index 7fd0a2d020397b..6056b6121f1e8b 100644
--- a/mlir/test/Dialect/EmitC/ops.mlir
+++ b/mlir/test/Dialect/EmitC/ops.mlir
@@ -238,6 +238,21 @@ emitc.verbatim "#endif  // __cplusplus"
 emitc.verbatim "typedef int32_t i32;"
 emitc.verbatim "typedef float f32;"
 
+// The value is not interpreted as format string if there are no operands.
+emitc.verbatim "{} {  }"
+
+func.func @test_verbatim(%arg0 : !emitc.ptr<i32>, %arg1 : i32) {
+  emitc.verbatim "{} + {};" args %arg0, %arg1 : !emitc.ptr<i32>, i32
+
+  // Trailing '{' are ok and don't start a placeholder.
+  emitc.verbatim "{} + {} {" args %arg0, %arg1 : !emitc.ptr<i32>, i32
+
+  // Check there is no ambiguity whether %a is the argument to the emitc.verbatim op.
+  emitc.verbatim "a"
+  %a = "emitc.constant"(){value = 42 : i32} : () -> i32
+
+  return
+}
 
 emitc.global @uninit : i32
 emitc.global @myglobal_int : i32 = 4
diff --git a/mlir/test/Target/Cpp/verbatim.mlir b/mlir/test/Target/Cpp/verbatim.mlir
index 10465dd781a81d..050187048c3638 100644
--- a/mlir/test/Target/Cpp/verbatim.mlir
+++ b/mlir/test/Target/Cpp/verbatim.mlir
@@ -1,5 +1,5 @@
-// RUN: mlir-translate -mlir-to-cpp %s | FileCheck %s
-// RUN: mlir-translate -mlir-to-cpp -declare-variables-at-top %s | FileCheck %s
+// RUN: mlir-translate -mlir-to-cpp %s | FileCheck %s --match-full-lines
+// RUN: mlir-translate -mlir-to-cpp -declare-variables-at-top %s | FileCheck %s --match-full-lines
 
 
 emitc.verbatim "#ifdef __cplusplus"
@@ -19,3 +19,27 @@ emitc.verbatim "typedef int32_t i32;"
 // CHECK-NEXT: typedef int32_t i32;
 emitc.verbatim "typedef float f32;"
 // CHECK-NEXT: typedef float f32;
+
+emitc.func @func(%arg: f32) {
+  // CHECK: void func(float [[V0:[^ ]*]]) {
+  %a = "emitc.variable"(){value = #emitc.opaque<"">} : () -> !emitc.array<3x7xi32>
+  // CHECK: int32_t [[A:[^ ]*]][3][7];
+
+  emitc.verbatim "{}" args %arg : f32
+  // CHECK: [[V0]]
+
+  emitc.verbatim "{} {{a" args %arg : f32
+  // CHECK-NEXT: [[V0]] {a
+
+  emitc.verbatim "#pragma my var={} property" args %arg : f32
+  // CHECK-NEXT: #pragma my var=[[V0]] property
+
+  // Trailing '{' are printed as-is.
+  emitc.verbatim "#pragma my var={} {" args %arg : f32
+  // CHECK-NEXT: #pragma my var=[[V0]] {
+
+  emitc.verbatim "#pragma my2 var={} property" args %a : !emitc.array<3x7xi32>
+  // CHECK-NEXT: #pragma my2 var=[[A]] property
+
+  emitc.return
+}

@simon-camp
Copy link
Contributor

I have two high level questions before reviewing in detail.

Can you show a case on how you intend to use this feature. It looks like this can be used (and abused) to do all kinds of side effecting operations to MLIR values, and I'm not sure how this interacts with the whole system.

emitc.verbatim "{}++;" args %arg0 : i32

I'm not directly against it, but I think discussions about it might be helpful.

For the generalization of opaque types; Do you use it for other things than templated types? I have considered having a template type like !emitc.template<"std::vector", [i32]>. This would be more structured than string formatting, but maybe it is too limiting for your use cases.

If we agree to land this, can we somehow back the formatting on llvm utilities?

@mgehre-amd
Copy link
Contributor Author

mgehre-amd commented Jan 30, 2025

I have two high level questions before reviewing in detail.

Can you show a case on how you intend to use this feature. It looks like this can be used (and abused) to do all kinds of side effecting operations to MLIR values, and I'm not sure how this interacts with the whole system.

Yes, verbatim is a sharp knife and you can stab yourself with it. Our main applications is to generate #pragmas my_own pragma varable=v1 that refer to local variables. Similarly, I guess it can be useful to call macros on local variables.

For the generalization of opaque types; Do you use it for other things than templated types? I have considered having a template type like !emitc.template<"std::vector", [i32]>. This would be more structured than string formatting, but maybe it is too limiting for your use cases.

Yes, we use it to generate template instantiations. !emitc.template would also work, but we found not much benefit over !emitc.verbatim because template types don't have more semantic meaning than verbatim types; it's just a more structured way to do the text replacement. On the other hand !emitc.opaque with arguments is strictly more powerful, so it's useful to have this first.

If we agree to land this, can we somehow back the formatting on llvm utilities?

I initially implemented it based on that. There is one main problem: the formatv functions expect the format string to be correct, and otherwise assert. Here, the format string is part of the user input and should lead to IR verification failure but not to an assert.
This lead me to implement a simple grammar for the replacement.

Copy link
Contributor

@simon-camp simon-camp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me 👍

Allows to print code snippets that refer to arguments or local variables.
E.g. `emitc.verbatim "#pragma abc var={}" args %arg0 : !emitc.ptr<i32>` is printed as `#pragma abc var=v1`
when the translater had decided to print `%arg0` as `v1`.

As a follow-up PR, we will use the same infra to extend opaque type,
which provides a way to generate template types depending on the spelling of other types.
@mgehre-amd mgehre-amd force-pushed the matthias.emitc_verbatim_args branch from 7985fcc to fce7219 Compare February 18, 2025 14:52
@mgehre-amd mgehre-amd merged commit 0049adf into llvm:main Feb 18, 2025
8 checks passed
@mgehre-amd mgehre-amd deleted the matthias.emitc_verbatim_args branch February 18, 2025 20:37
@kazutakahirata
Copy link
Contributor

@mgehre-amd I've checked in 2c05d02 to fix warnings. I've chosen to add [[maybe_unused]] instead of removing the two functions in case you are in the process of building out things. If you don't need them, you might want to remove them. Thanks!

wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
Allows to print code snippets that refer to arguments or local
variables. E.g. `emitc.verbatim "#pragma abc var={}" args %arg0 :
!emitc.ptr<i32>` is printed as `#pragma abc var=v1` when the translator
had decided to print `%arg0` as `v1`.

As a follow-up PR, we will use the same infra to extend opaque type,
which provides a way to generate template types depending on the
spelling of other types.
@mgehre-amd
Copy link
Contributor Author

@mgehre-amd I've checked in 2c05d02 to fix warnings. I've chosen to add [[maybe_unused]] instead of removing the two functions in case you are in the process of building out things. If you don't need them, you might want to remove them. Thanks!

Thanks for the fix! This was an oversight on my side, and I will remove those functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants