diff --git a/mlir/docs/BufferDeallocationInternals.md b/mlir/docs/BufferDeallocationInternals.md deleted file mode 100644 index 00830ba9d2dc2..0000000000000 --- a/mlir/docs/BufferDeallocationInternals.md +++ /dev/null @@ -1,705 +0,0 @@ -# Buffer Deallocation - Internals - -**Note:** This pass is deprecated. Please use the ownership-based buffer -deallocation pass instead. - -This section covers the internal functionality of the BufferDeallocation -transformation. The transformation consists of several passes. The main pass -called BufferDeallocation can be applied via “-buffer-deallocation” on MLIR -programs. - -[TOC] - -## Requirements - -In order to use BufferDeallocation on an arbitrary dialect, several control-flow -interfaces have to be implemented when using custom operations. This is -particularly important to understand the implicit control-flow dependencies -between different parts of the input program. Without implementing the following -interfaces, control-flow relations cannot be discovered properly and the -resulting program can become invalid: - -* Branch-like terminators should implement the `BranchOpInterface` to query - and manipulate associated operands. -* Operations involving structured control flow have to implement the - `RegionBranchOpInterface` to model inter-region control flow. -* Terminators yielding values to their parent operation (in particular in the - scope of nested regions within `RegionBranchOpInterface`-based operations), - should implement the `ReturnLike` trait to represent logical “value - returns”. - -Example dialects that are fully compatible are the “std” and “scf” dialects with -respect to all implemented interfaces. - -During Bufferization, we convert immutable value types (tensors) to mutable -types (memref). This conversion is done in several steps and in all of these -steps the IR has to fulfill SSA like properties. The usage of memref has to be -in the following consecutive order: allocation, write-buffer, read- buffer. In -this case, there are only buffer reads allowed after the initial full buffer -write is done. In particular, there must be no partial write to a buffer after -the initial write has been finished. However, partial writes in the initializing -is allowed (fill buffer step by step in a loop e.g.). This means, all buffer -writes needs to dominate all buffer reads. - -Example for breaking the invariant: - -```mlir -func.func @condBranch(%arg0: i1, %arg1: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - cf.br ^bb3() -^bb2: - partial_write(%0, %0) - cf.br ^bb3() -^bb3(): - test.copy(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> () - return -} -``` - -The maintenance of the SSA like properties is only needed in the bufferization -process. Afterwards, for example in optimization processes, the property is no -longer needed. - -## Detection of Buffer Allocations - -The first step of the BufferDeallocation transformation is to identify -manageable allocation operations that implement the `SideEffects` interface. -Furthermore, these ops need to apply the effect `MemoryEffects::Allocate` to a -particular result value while not using the resource -`SideEffects::AutomaticAllocationScopeResource` (since it is currently reserved -for allocations, like `Alloca` that will be automatically deallocated by a -parent scope). Allocations that have not been detected in this phase will not be -tracked internally, and thus, not deallocated automatically. However, -BufferDeallocation is fully compatible with “hybrid” setups in which tracked and -untracked allocations are mixed: - -```mlir -func.func @mixedAllocation(%arg0: i1) { - %0 = memref.alloca() : memref<2xf32> // aliases: %2 - %1 = memref.alloc() : memref<2xf32> // aliases: %2 - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - use(%0) - cf.br ^bb3(%0 : memref<2xf32>) -^bb2: - use(%1) - cf.br ^bb3(%1 : memref<2xf32>) -^bb3(%2: memref<2xf32>): - ... -} -``` - -Example of using a conditional branch with alloc and alloca. BufferDeallocation -can detect and handle the different allocation types that might be intermixed. - -Note: the current version does not support allocation operations returning -multiple result buffers. - -## Conversion from AllocOp to AllocaOp - -The PromoteBuffersToStack-pass converts AllocOps to AllocaOps, if possible. In -some cases, it can be useful to use such stack-based buffers instead of -heap-based buffers. The conversion is restricted to several constraints like: - -* Control flow -* Buffer Size -* Dynamic Size - -If a buffer is leaving a block, we are not allowed to convert it into an alloca. -If the size of the buffer is large, we could convert it, but regarding stack -overflow, it makes sense to limit the size of these buffers and only convert -small ones. The size can be set via a pass option. The current default value is -1KB. Furthermore, we can not convert buffers with dynamic size, since the -dimension is not known a priori. - -## Movement and Placement of Allocations - -Using the buffer hoisting pass, all buffer allocations are moved as far upwards -as possible in order to group them and make upcoming optimizations easier by -limiting the search space. Such a movement is shown in the following graphs. In -addition, we are able to statically free an alloc, if we move it into a -dominator of all of its uses. This simplifies further optimizations (e.g. buffer -fusion) in the future. However, movement of allocations is limited by external -data dependencies (in particular in the case of allocations of dynamically -shaped types). Furthermore, allocations can be moved out of nested regions, if -necessary. In order to move allocations to valid locations with respect to their -uses only, we leverage Liveness information. - -The following code snippets shows a conditional branch before running the -BufferHoisting pass: - -![branch_example_pre_move](/includes/img/branch_example_pre_move.svg) - -```mlir -func.func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - cf.br ^bb3(%arg1 : memref<2xf32>) -^bb2: - %0 = memref.alloc() : memref<2xf32> // aliases: %1 - use(%0) - cf.br ^bb3(%0 : memref<2xf32>) -^bb3(%1: memref<2xf32>): // %1 could be %0 or %arg1 - test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> () - return -} -``` - -Applying the BufferHoisting pass on this program results in the following piece -of code: - -![branch_example_post_move](/includes/img/branch_example_post_move.svg) - -```mlir -func.func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> // moved to bb0 - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - cf.br ^bb3(%arg1 : memref<2xf32>) -^bb2: - use(%0) - cf.br ^bb3(%0 : memref<2xf32>) -^bb3(%1: memref<2xf32>): - test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> () - return -} -``` - -The alloc is moved from bb2 to the beginning and it is passed as an argument to -bb3. - -The following example demonstrates an allocation using dynamically shaped types. -Due to the data dependency of the allocation to %0, we cannot move the -allocation out of bb2 in this case: - -```mlir -func.func @condBranchDynamicType( - %arg0: i1, - %arg1: memref, - %arg2: memref, - %arg3: index) { - cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index) -^bb1: - cf.br ^bb3(%arg1 : memref) -^bb2(%0: index): - %1 = memref.alloc(%0) : memref // cannot be moved upwards to the data - // dependency to %0 - use(%1) - cf.br ^bb3(%1 : memref) -^bb3(%2: memref): - test.copy(%2, %arg2) : (memref, memref) -> () - return -} -``` - -## Introduction of Clones - -In order to guarantee that all allocated buffers are freed properly, we have to -pay attention to the control flow and all potential aliases a buffer allocation -can have. Since not all allocations can be safely freed with respect to their -aliases (see the following code snippet), it is often required to introduce -copies to eliminate them. Consider the following example in which the -allocations have already been placed: - -```mlir -func.func @branch(%arg0: i1) { - %0 = memref.alloc() : memref<2xf32> // aliases: %2 - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - %1 = memref.alloc() : memref<2xf32> // resides here for demonstration purposes - // aliases: %2 - cf.br ^bb3(%1 : memref<2xf32>) -^bb2: - use(%0) - cf.br ^bb3(%0 : memref<2xf32>) -^bb3(%2: memref<2xf32>): - … - return -} -``` - -The first alloc can be safely freed after the live range of its post-dominator -block (bb3). The alloc in bb1 has an alias %2 in bb3 that also keeps this buffer -alive until the end of bb3. Since we cannot determine the actual branches that -will be taken at runtime, we have to ensure that all buffers are freed correctly -in bb3 regardless of the branches we will take to reach the exit block. This -makes it necessary to introduce a copy for %2, which allows us to free %alloc0 -in bb0 and %alloc1 in bb1. Afterwards, we can continue processing all aliases of -%2 (none in this case) and we can safely free %2 at the end of the sample -program. This sample demonstrates that not all allocations can be safely freed -in their associated post-dominator blocks. Instead, we have to pay attention to -all of their aliases. - -Applying the BufferDeallocation pass to the program above yields the following -result: - -```mlir -func.func @branch(%arg0: i1) { - %0 = memref.alloc() : memref<2xf32> - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - %1 = memref.alloc() : memref<2xf32> - %3 = bufferization.clone %1 : (memref<2xf32>) -> (memref<2xf32>) - memref.dealloc %1 : memref<2xf32> // %1 can be safely freed here - cf.br ^bb3(%3 : memref<2xf32>) -^bb2: - use(%0) - %4 = bufferization.clone %0 : (memref<2xf32>) -> (memref<2xf32>) - cf.br ^bb3(%4 : memref<2xf32>) -^bb3(%2: memref<2xf32>): - … - memref.dealloc %2 : memref<2xf32> // free temp buffer %2 - memref.dealloc %0 : memref<2xf32> // %0 can be safely freed here - return -} -``` - -Note that a temporary buffer for %2 was introduced to free all allocations -properly. Note further that the unnecessary allocation of %3 can be easily -removed using one of the post-pass transformations or the canonicalization pass. - -The presented example also works with dynamically shaped types. - -BufferDeallocation performs a fix-point iteration taking all aliases of all -tracked allocations into account. We initialize the general iteration process -using all tracked allocations and their associated aliases. As soon as we -encounter an alias that is not properly dominated by our allocation, we mark -this alias as *critical* (needs to be freed and tracked by the internal -fix-point iteration). The following sample demonstrates the presence of critical -and non-critical aliases: - -![nested_branch_example_pre_move](/includes/img/nested_branch_example_pre_move.svg) - -```mlir -func.func @condBranchDynamicTypeNested( - %arg0: i1, - %arg1: memref, // aliases: %3, %4 - %arg2: memref, - %arg3: index) { - cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index) -^bb1: - cf.br ^bb6(%arg1 : memref) -^bb2(%0: index): - %1 = memref.alloc(%0) : memref // cannot be moved upwards due to the data - // dependency to %0 - // aliases: %2, %3, %4 - use(%1) - cf.cond_br %arg0, ^bb3, ^bb4 -^bb3: - cf.br ^bb5(%1 : memref) -^bb4: - cf.br ^bb5(%1 : memref) -^bb5(%2: memref): // non-crit. alias of %1, since %1 dominates %2 - cf.br ^bb6(%2 : memref) -^bb6(%3: memref): // crit. alias of %arg1 and %2 (in other words %1) - cf.br ^bb7(%3 : memref) -^bb7(%4: memref): // non-crit. alias of %3, since %3 dominates %4 - test.copy(%4, %arg2) : (memref, memref) -> () - return -} -``` - -Applying BufferDeallocation yields the following output: - -![nested_branch_example_post_move](/includes/img/nested_branch_example_post_move.svg) - -```mlir -func.func @condBranchDynamicTypeNested( - %arg0: i1, - %arg1: memref, - %arg2: memref, - %arg3: index) { - cf.cond_br %arg0, ^bb1, ^bb2(%arg3 : index) -^bb1: - // temp buffer required due to alias %3 - %5 = bufferization.clone %arg1 : (memref) -> (memref) - cf.br ^bb6(%5 : memref) -^bb2(%0: index): - %1 = memref.alloc(%0) : memref - use(%1) - cf.cond_br %arg0, ^bb3, ^bb4 -^bb3: - cf.br ^bb5(%1 : memref) -^bb4: - cf.br ^bb5(%1 : memref) -^bb5(%2: memref): - %6 = bufferization.clone %1 : (memref) -> (memref) - memref.dealloc %1 : memref - cf.br ^bb6(%6 : memref) -^bb6(%3: memref): - cf.br ^bb7(%3 : memref) -^bb7(%4: memref): - test.copy(%4, %arg2) : (memref, memref) -> () - memref.dealloc %3 : memref // free %3, since %4 is a non-crit. alias of %3 - return -} -``` - -Since %3 is a critical alias, BufferDeallocation introduces an additional -temporary copy in all predecessor blocks. %3 has an additional (non-critical) -alias %4 that extends the live range until the end of bb7. Therefore, we can -free %3 after its last use, while taking all aliases into account. Note that %4 -does not need to be freed, since we did not introduce a copy for it. - -The actual introduction of buffer copies is done after the fix-point iteration -has been terminated and all critical aliases have been detected. A critical -alias can be either a block argument or another value that is returned by an -operation. Copies for block arguments are handled by analyzing all predecessor -blocks. This is primarily done by querying the `BranchOpInterface` of the -associated branch terminators that can jump to the current block. Consider the -following example which involves a simple branch and the critical block argument -%2: - -```mlir - custom.br ^bb1(..., %0, : ...) - ... - custom.br ^bb1(..., %1, : ...) - ... -^bb1(%2: memref<2xf32>): - ... -``` - -The `BranchOpInterface` allows us to determine the actual values that will be -passed to block bb1 and its argument %2 by analyzing its predecessor blocks. -Once we have resolved the values %0 and %1 (that are associated with %2 in this -sample), we can introduce a temporary buffer and clone its contents into the new -buffer. Afterwards, we rewire the branch operands to use the newly allocated -buffer instead. However, blocks can have implicitly defined predecessors by -parent ops that implement the `RegionBranchOpInterface`. This can be the case if -this block argument belongs to the entry block of a region. In this setting, we -have to identify all predecessor regions defined by the parent operation. For -every region, we need to get all terminator operations implementing the -`ReturnLike` trait, indicating that they can branch to our current block. -Finally, we can use a similar functionality as described above to add the -temporary copy. This time, we can modify the terminator operands directly -without touching a high-level interface. - -Consider the following inner-region control-flow sample that uses an imaginary -“custom.region_if” operation. It either executes the “then” or “else” region and -always continues to the “join” region. The “custom.region_if_yield” operation -returns a result to the parent operation. This sample demonstrates the use of -the `RegionBranchOpInterface` to determine predecessors in order to infer the -high-level control flow: - -```mlir -func.func @inner_region_control_flow( - %arg0 : index, - %arg1 : index) -> memref { - %0 = memref.alloc(%arg0, %arg0) : memref - %1 = custom.region_if %0 : memref -> (memref) - then(%arg2 : memref) { // aliases: %arg4, %1 - custom.region_if_yield %arg2 : memref - } else(%arg3 : memref) { // aliases: %arg4, %1 - custom.region_if_yield %arg3 : memref - } join(%arg4 : memref) { // aliases: %1 - custom.region_if_yield %arg4 : memref - } - return %1 : memref -} -``` - -![region_branch_example_pre_move](/includes/img/region_branch_example_pre_move.svg) - -Non-block arguments (other values) can become aliases when they are returned by -dialect-specific operations. BufferDeallocation supports this behavior via the -`RegionBranchOpInterface`. Consider the following example that uses an “scf.if” -operation to determine the value of %2 at runtime which creates an alias: - -```mlir -func.func @nested_region_control_flow(%arg0 : index, %arg1 : index) -> memref { - %0 = arith.cmpi "eq", %arg0, %arg1 : index - %1 = memref.alloc(%arg0, %arg0) : memref - %2 = scf.if %0 -> (memref) { - scf.yield %1 : memref // %2 will be an alias of %1 - } else { - %3 = memref.alloc(%arg0, %arg1) : memref // nested allocation in a div. - // branch - use(%3) - scf.yield %1 : memref // %2 will be an alias of %1 - } - return %2 : memref -} -``` - -In this example, a dealloc is inserted to release the buffer within the else -block since it cannot be accessed by the remainder of the program. Accessing the -`RegionBranchOpInterface`, allows us to infer that %2 is a non-critical alias of -%1 which does not need to be tracked. - -```mlir -func.func @nested_region_control_flow(%arg0: index, %arg1: index) -> memref { - %0 = arith.cmpi "eq", %arg0, %arg1 : index - %1 = memref.alloc(%arg0, %arg0) : memref - %2 = scf.if %0 -> (memref) { - scf.yield %1 : memref - } else { - %3 = memref.alloc(%arg0, %arg1) : memref - use(%3) - memref.dealloc %3 : memref // %3 can be safely freed here - scf.yield %1 : memref - } - return %2 : memref -} -``` - -Analogous to the previous case, we have to detect all terminator operations in -all attached regions of “scf.if” that provides a value to its parent operation -(in this sample via scf.yield). Querying the `RegionBranchOpInterface` allows us -to determine the regions that “return” a result to their parent operation. Like -before, we have to update all `ReturnLike` terminators as described above. -Reconsider a slightly adapted version of the “custom.region_if” example from -above that uses a nested allocation: - -```mlir -func.func @inner_region_control_flow_div( - %arg0 : index, - %arg1 : index) -> memref { - %0 = memref.alloc(%arg0, %arg0) : memref - %1 = custom.region_if %0 : memref -> (memref) - then(%arg2 : memref) { // aliases: %arg4, %1 - custom.region_if_yield %arg2 : memref - } else(%arg3 : memref) { - %2 = memref.alloc(%arg0, %arg1) : memref // aliases: %arg4, %1 - custom.region_if_yield %2 : memref - } join(%arg4 : memref) { // aliases: %1 - custom.region_if_yield %arg4 : memref - } - return %1 : memref -} -``` - -Since the allocation %2 happens in a divergent branch and cannot be safely -deallocated in a post-dominator, %arg4 will be considered a critical alias. -Furthermore, %arg4 is returned to its parent operation and has an alias %1. This -causes BufferDeallocation to introduce additional copies: - -```mlir -func.func @inner_region_control_flow_div( - %arg0 : index, - %arg1 : index) -> memref { - %0 = memref.alloc(%arg0, %arg0) : memref - %1 = custom.region_if %0 : memref -> (memref) - then(%arg2 : memref) { - %4 = bufferization.clone %arg2 : (memref) -> (memref) - custom.region_if_yield %4 : memref - } else(%arg3 : memref) { - %2 = memref.alloc(%arg0, %arg1) : memref - %5 = bufferization.clone %2 : (memref) -> (memref) - memref.dealloc %2 : memref - custom.region_if_yield %5 : memref - } join(%arg4: memref) { - %4 = bufferization.clone %arg4 : (memref) -> (memref) - memref.dealloc %arg4 : memref - custom.region_if_yield %4 : memref - } - memref.dealloc %0 : memref // %0 can be safely freed here - return %1 : memref -} -``` - -## Placement of Deallocs - -After introducing allocs and copies, deallocs have to be placed to free -allocated memory and avoid memory leaks. The deallocation needs to take place -after the last use of the given value. The position can be determined by -calculating the common post-dominator of all values using their remaining -non-critical aliases. A special-case is the presence of back edges: since such -edges can cause memory leaks when a newly allocated buffer flows back to another -part of the program. In these cases, we need to free the associated buffer -instances from the previous iteration by inserting additional deallocs. - -Consider the following “scf.for” use case containing a nested structured -control-flow if: - -```mlir -func.func @loop_nested_if( - %lb: index, - %ub: index, - %step: index, - %buf: memref<2xf32>, - %res: memref<2xf32>) { - %0 = scf.for %i = %lb to %ub step %step - iter_args(%iterBuf = %buf) -> memref<2xf32> { - %1 = arith.cmpi "eq", %i, %ub : index - %2 = scf.if %1 -> (memref<2xf32>) { - %3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias due to a - // divergent allocation - use(%3) - scf.yield %3 : memref<2xf32> - } else { - scf.yield %iterBuf : memref<2xf32> - } - scf.yield %2 : memref<2xf32> - } - test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> () - return -} -``` - -In this example, the *then* branch of the nested “scf.if” operation returns a -newly allocated buffer. - -Since this allocation happens in the scope of a divergent branch, %2 becomes a -critical alias that needs to be handled. As before, we have to insert additional -copies to eliminate this alias using copies of %3 and %iterBuf. This guarantees -that %2 will be a newly allocated buffer that is returned in each iteration. -However, “returning” %2 to its alias %iterBuf turns %iterBuf into a critical -alias as well. In other words, we have to create a copy of %2 to pass it to -%iterBuf. Since this jump represents a back edge, and %2 will always be a new -buffer, we have to free the buffer from the previous iteration to avoid memory -leaks: - -```mlir -func.func @loop_nested_if( - %lb: index, - %ub: index, - %step: index, - %buf: memref<2xf32>, - %res: memref<2xf32>) { - %4 = bufferization.clone %buf : (memref<2xf32>) -> (memref<2xf32>) - %0 = scf.for %i = %lb to %ub step %step - iter_args(%iterBuf = %4) -> memref<2xf32> { - %1 = arith.cmpi "eq", %i, %ub : index - %2 = scf.if %1 -> (memref<2xf32>) { - %3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias - use(%3) - %5 = bufferization.clone %3 : (memref<2xf32>) -> (memref<2xf32>) - memref.dealloc %3 : memref<2xf32> - scf.yield %5 : memref<2xf32> - } else { - %6 = bufferization.clone %iterBuf : (memref<2xf32>) -> (memref<2xf32>) - scf.yield %6 : memref<2xf32> - } - %7 = bufferization.clone %2 : (memref<2xf32>) -> (memref<2xf32>) - memref.dealloc %2 : memref<2xf32> - memref.dealloc %iterBuf : memref<2xf32> // free backedge iteration variable - scf.yield %7 : memref<2xf32> - } - test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> () - memref.dealloc %0 : memref<2xf32> // free temp copy %0 - return -} -``` - -Example for loop-like control flow. The CFG contains back edges that have to be -handled to avoid memory leaks. The bufferization is able to free the backedge -iteration variable %iterBuf. - -## Private Analyses Implementations - -The BufferDeallocation transformation relies on one primary control-flow -analysis: BufferPlacementAliasAnalysis. Furthermore, we also use dominance and -liveness to place and move nodes. The liveness analysis determines the live -range of a given value. Within this range, a value is alive and can or will be -used in the course of the program. After this range, the value is dead and can -be discarded - in our case, the buffer can be freed. To place the allocs, we -need to know from which position a value will be alive. The allocs have to be -placed in front of this position. However, the most important analysis is the -alias analysis that is needed to introduce copies and to place all -deallocations. - -# Post Phase - -In order to limit the complexity of the BufferDeallocation transformation, some -tiny code-polishing/optimization transformations are not applied on-the-fly -during placement. Currently, a canonicalization pattern is added to the clone -operation to reduce the appearance of unnecessary clones. - -Note: further transformations might be added to the post-pass phase in the -future. - -## Clone Canonicalization - -During placement of clones it may happen, that unnecessary clones are inserted. -If these clones appear with their corresponding dealloc operation within the -same block, we can use the canonicalizer to remove these unnecessary operations. -Note, that this step needs to take place after the insertion of clones and -deallocs in the buffer deallocation step. The canonicalization inludes both, the -newly created target value from the clone operation and the source operation. - -## Canonicalization of the Source Buffer of the Clone Operation - -In this case, the source of the clone operation can be used instead of its -target. The unused allocation and deallocation operations that are defined for -this clone operation are also removed. Here is a working example generated by -the BufferDeallocation pass that allocates a buffer with dynamic size. A deeper -analysis of this sample reveals that the highlighted operations are redundant -and can be removed. - -```mlir -func.func @dynamic_allocation(%arg0: index, %arg1: index) -> memref { - %1 = memref.alloc(%arg0, %arg1) : memref - %2 = bufferization.clone %1 : (memref) -> (memref) - memref.dealloc %1 : memref - return %2 : memref -} -``` - -Will be transformed to: - -```mlir -func.func @dynamic_allocation(%arg0: index, %arg1: index) -> memref { - %1 = memref.alloc(%arg0, %arg1) : memref - return %1 : memref -} -``` - -In this case, the additional copy %2 can be replaced with its original source -buffer %1. This also applies to the associated dealloc operation of %1. - -## Canonicalization of the Target Buffer of the Clone Operation - -In this case, the target buffer of the clone operation can be used instead of -its source. The unused deallocation operation that is defined for this clone -operation is also removed. - -Consider the following example where a generic test operation writes the result -to %temp and then copies %temp to %result. However, these two operations can be -merged into a single step. Canonicalization removes the clone operation and -%temp, and replaces the uses of %temp with %result: - -```mlir -func.func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){ - %temp = memref.alloc() : memref<2xf32> - test.generic { - args_in = 1 : i64, - args_out = 1 : i64, - indexing_maps = [#map0, #map0], - iterator_types = ["parallel"]} %arg0, %temp { - ^bb0(%gen2_arg0: f32, %gen2_arg1: f32): - %tmp2 = math.exp %gen2_arg0 : f32 - test.yield %tmp2 : f32 - }: memref<2xf32>, memref<2xf32> - %result = bufferization.clone %temp : (memref<2xf32>) -> (memref<2xf32>) - memref.dealloc %temp : memref<2xf32> - return -} -``` - -Will be transformed to: - -```mlir -func.func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){ - test.generic { - args_in = 1 : i64, - args_out = 1 : i64, - indexing_maps = [#map0, #map0], - iterator_types = ["parallel"]} %arg0, %result { - ^bb0(%gen2_arg0: f32, %gen2_arg1: f32): - %tmp2 = math.exp %gen2_arg0 : f32 - test.yield %tmp2 : f32 - }: memref<2xf32>, memref<2xf32> - return -} -``` - -## Known Limitations - -BufferDeallocation introduces additional clones from “memref” dialect -(“bufferization.clone”). Analogous, all deallocations use the “memref” -dialect-free operation “memref.dealloc”. The actual copy process is realized -using “test.copy”. Furthermore, buffers are essentially immutable after their -creation in a block. Another limitations are known in the case using -unstructered control flow. diff --git a/mlir/docs/OwnershipBasedBufferDeallocation.md b/mlir/docs/OwnershipBasedBufferDeallocation.md index 9036c811c5daf..f5fa01c4c49cc 100644 --- a/mlir/docs/OwnershipBasedBufferDeallocation.md +++ b/mlir/docs/OwnershipBasedBufferDeallocation.md @@ -5,9 +5,7 @@ One-Shot Bufferize does not deallocate any buffers that it allocates. After running One-Shot Bufferize, the resulting IR may have a number of `memref.alloc` ops, but no `memref.dealloc` ops. Buffer dellocation is delegated to the -`-ownership-based-buffer-deallocation` pass. This pass supersedes the now -deprecated `-buffer-deallocation` pass, which does not work well with -One-Shot Bufferize. +`-ownership-based-buffer-deallocation` pass. On a high level, buffers are "owned" by a basic block. Ownership materializes as an `i1` SSA value and can be thought of as "responsibility to deallocate". It diff --git a/mlir/docs/includes/img/branch_example_post_move.svg b/mlir/docs/includes/img/branch_example_post_move.svg deleted file mode 100644 index 870df495a13c6..0000000000000 --- a/mlir/docs/includes/img/branch_example_post_move.svg +++ /dev/null @@ -1,419 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - - - - - - in: %arg0, %arg1, %arg2 - bb0 - - bb2 - bb1 - - bb3 (%1) - -use(%0) -cf.br bb3(%0) - - copy (%1, arg2) - %0 - %arg1 - %0 = memref.alloc() - - diff --git a/mlir/docs/includes/img/branch_example_pre_move.svg b/mlir/docs/includes/img/branch_example_pre_move.svg deleted file mode 100644 index 5eb15fd13946e..0000000000000 --- a/mlir/docs/includes/img/branch_example_pre_move.svg +++ /dev/null @@ -1,409 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - - - - - - in: %arg0, %arg1, %arg2 - bb0 - - bb2 - bb1 - - bb3 (%1) - %0 = memref.alloc() -use(%0) -cf.br bb3(%0) - - copy (%1, arg2) - %0 - %arg1 - - diff --git a/mlir/docs/includes/img/nested_branch_example_post_move.svg b/mlir/docs/includes/img/nested_branch_example_post_move.svg deleted file mode 100644 index 27923627ad3d2..0000000000000 --- a/mlir/docs/includes/img/nested_branch_example_post_move.svg +++ /dev/null @@ -1,759 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - - in: %arg0, %arg1, %arg2, %arg3 - bb0 - - bb1 - - - %5 - - - - - bb2 (%0) - bb4 - bb3 - - bb5 (%2) - %1 - - bb7 (%4) - - - - - - - - - bb6 (%3) - %1 - %6 - - %1 = memref.alloc(%0)use(%1) - - copy(%4, %arg2)dealloc %3 - %3 - %5 = memref.alloc(%d0)copy(%arg1, %5) - %6 = memref.alloc(%d1)copy(%1, %6)dealloc %1 - - diff --git a/mlir/docs/includes/img/nested_branch_example_pre_move.svg b/mlir/docs/includes/img/nested_branch_example_pre_move.svg deleted file mode 100644 index 9f2c603511f84..0000000000000 --- a/mlir/docs/includes/img/nested_branch_example_pre_move.svg +++ /dev/null @@ -1,717 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - - in: %arg0, %arg1, %arg2, %arg3 - bb0 - - bb1 - - - %arg1 - - - - - bb2 (%0) - bb4 - bb3 - - bb5 (%2) - %1 - - bb7 (%4) - - - - - - - - - bb6 (%3) - %1 - %2 - - %1 = memref.alloc(%0)use(%0) - - copy(%4, %arg2) - %3 - - diff --git a/mlir/docs/includes/img/region_branch_example_pre_move.svg b/mlir/docs/includes/img/region_branch_example_pre_move.svg deleted file mode 100644 index 79c83fbe35a9e..0000000000000 --- a/mlir/docs/includes/img/region_branch_example_pre_move.svg +++ /dev/null @@ -1,435 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - - - - - - - - - - - %0 - if - - then - %1 - else - - join - %arg4 - %arg4 - %arg2 - %arg3 - - - diff --git a/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h b/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h index c8e456a1d7e38..c5d0853d6ff97 100644 --- a/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h +++ b/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h @@ -30,10 +30,6 @@ using DeallocHelperMap = llvm::DenseMap; #define GEN_PASS_DECL #include "mlir/Dialect/Bufferization/Transforms/Passes.h.inc" -/// Creates an instance of the BufferDeallocation pass to free all allocated -/// buffers. -std::unique_ptr createBufferDeallocationPass(); - /// Creates an instance of the OwnershipBasedBufferDeallocation pass to free all /// allocated buffers. std::unique_ptr createOwnershipBasedBufferDeallocationPass( @@ -141,9 +137,6 @@ void populateBufferizationDeallocLoweringPattern( func::FuncOp buildDeallocationLibraryFunction(OpBuilder &builder, Location loc, SymbolTable &symbolTable); -/// Run buffer deallocation. -LogicalResult deallocateBuffers(Operation *op); - /// Run the ownership-based buffer deallocation. LogicalResult deallocateBuffersOwnershipBased(FunctionOpInterface op, DeallocationOptions options); diff --git a/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td b/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td index 3bcde8edde509..f20f177d8443b 100644 --- a/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td +++ b/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td @@ -11,79 +11,6 @@ include "mlir/Pass/PassBase.td" -def BufferDeallocation : Pass<"buffer-deallocation", "func::FuncOp"> { - let summary = "Adds all required dealloc operations for all allocations in " - "the input program"; - let description = [{ - This pass implements an algorithm to automatically introduce all required - deallocation operations for all buffers in the input program. This ensures - that the resulting program does not have any memory leaks. - - - Input - - ```mlir - #map0 = affine_map<(d0) -> (d0)> - module { - func.func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - cf.cond_br %arg0, ^bb1, ^bb2 - ^bb1: - cf.br ^bb3(%arg1 : memref<2xf32>) - ^bb2: - %0 = memref.alloc() : memref<2xf32> - linalg.generic { - indexing_maps = [#map0, #map0], - iterator_types = ["parallel"]} %arg1, %0 { - ^bb0(%gen1_arg0: f32, %gen1_arg1: f32): - %tmp1 = exp %gen1_arg0 : f32 - linalg.yield %tmp1 : f32 - }: memref<2xf32>, memref<2xf32> - cf.br ^bb3(%0 : memref<2xf32>) - ^bb3(%1: memref<2xf32>): - "memref.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> () - return - } - } - - ``` - - Output - - ```mlir - #map0 = affine_map<(d0) -> (d0)> - module { - func.func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - cf.cond_br %arg0, ^bb1, ^bb2 - ^bb1: // pred: ^bb0 - %0 = memref.alloc() : memref<2xf32> - memref.copy(%arg1, %0) : memref<2xf32>, memref<2xf32> - cf.br ^bb3(%0 : memref<2xf32>) - ^bb2: // pred: ^bb0 - %1 = memref.alloc() : memref<2xf32> - linalg.generic { - indexing_maps = [#map0, #map0], - iterator_types = ["parallel"]} %arg1, %1 { - ^bb0(%arg3: f32, %arg4: f32): - %4 = exp %arg3 : f32 - linalg.yield %4 : f32 - }: memref<2xf32>, memref<2xf32> - %2 = memref.alloc() : memref<2xf32> - memref.copy(%1, %2) : memref<2xf32>, memref<2xf32> - dealloc %1 : memref<2xf32> - cf.br ^bb3(%2 : memref<2xf32>) - ^bb3(%3: memref<2xf32>): // 2 preds: ^bb1, ^bb2 - memref.copy(%3, %arg2) : memref<2xf32>, memref<2xf32> - dealloc %3 : memref<2xf32> - return - } - - } - ``` - - }]; - let constructor = "mlir::bufferization::createBufferDeallocationPass()"; -} - def OwnershipBasedBufferDeallocation : Pass< "ownership-based-buffer-deallocation"> { let summary = "Adds all required dealloc operations for all allocations in " @@ -390,8 +317,9 @@ def OneShotBufferize : Pass<"one-shot-bufferize", "ModuleOp"> { results in a new buffer allocation. One-Shot Bufferize does not deallocate any buffers that it allocates. The - `-buffer-deallocation` pass should be run after One-Shot Bufferize to insert - the deallocation operations necessary to eliminate memory leaks. + `-buffer-deallocation-pipeline` pipeline should be run after One-Shot + Bufferize to insert the deallocation operations necessary to eliminate + memory leaks. One-Shot Bufferize will by default reject IR that contains non-bufferizable op, i.e., ops that do not implemement BufferizableOpInterface. Such IR can diff --git a/mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp b/mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp deleted file mode 100644 index a0a81d4add712..0000000000000 --- a/mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp +++ /dev/null @@ -1,693 +0,0 @@ -//===- BufferDeallocation.cpp - the impl for buffer deallocation ----------===// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// -// -// This file implements logic for computing correct alloc and dealloc positions. -// Furthermore, buffer deallocation also adds required new clone operations to -// ensure that all buffers are deallocated. The main class is the -// BufferDeallocationPass class that implements the underlying algorithm. In -// order to put allocations and deallocations at safe positions, it is -// significantly important to put them into the correct blocks. However, the -// liveness analysis does not pay attention to aliases, which can occur due to -// branches (and their associated block arguments) in general. For this purpose, -// BufferDeallocation firstly finds all possible aliases for a single value -// (using the BufferViewFlowAnalysis class). Consider the following example: -// -// ^bb0(%arg0): -// cf.cond_br %cond, ^bb1, ^bb2 -// ^bb1: -// cf.br ^exit(%arg0) -// ^bb2: -// %new_value = ... -// cf.br ^exit(%new_value) -// ^exit(%arg1): -// return %arg1; -// -// We should place the dealloc for %new_value in exit. However, we have to free -// the buffer in the same block, because it cannot be freed in the post -// dominator. However, this requires a new clone buffer for %arg1 that will -// contain the actual contents. Using the class BufferViewFlowAnalysis, we -// will find out that %new_value has a potential alias %arg1. In order to find -// the dealloc position we have to find all potential aliases, iterate over -// their uses and find the common post-dominator block (note that additional -// clones and buffers remove potential aliases and will influence the placement -// of the deallocs). In all cases, the computed block can be safely used to free -// the %new_value buffer (may be exit or bb2) as it will die and we can use -// liveness information to determine the exact operation after which we have to -// insert the dealloc. However, the algorithm supports introducing clone buffers -// and placing deallocs in safe locations to ensure that all buffers will be -// freed in the end. -// -// TODO: -// The current implementation does not support explicit-control-flow loops and -// the resulting code will be invalid with respect to program semantics. -// However, structured control-flow loops are fully supported. Furthermore, it -// doesn't accept functions which return buffers already. -// -//===----------------------------------------------------------------------===// - -#include "mlir/Dialect/Bufferization/Transforms/Passes.h" - -#include "mlir/Dialect/Bufferization/IR/AllocationOpInterface.h" -#include "mlir/Dialect/Bufferization/IR/Bufferization.h" -#include "mlir/Dialect/Bufferization/Transforms/BufferUtils.h" -#include "mlir/Dialect/Func/IR/FuncOps.h" -#include "mlir/Dialect/MemRef/IR/MemRef.h" -#include "llvm/ADT/SetOperations.h" - -namespace mlir { -namespace bufferization { -#define GEN_PASS_DEF_BUFFERDEALLOCATION -#include "mlir/Dialect/Bufferization/Transforms/Passes.h.inc" -} // namespace bufferization -} // namespace mlir - -using namespace mlir; -using namespace mlir::bufferization; - -/// Walks over all immediate return-like terminators in the given region. -static LogicalResult walkReturnOperations( - Region *region, - llvm::function_ref func) { - for (Block &block : *region) { - Operation *terminator = block.getTerminator(); - // Skip non region-return-like terminators. - if (auto regionTerminator = - dyn_cast(terminator)) { - if (failed(func(regionTerminator))) - return failure(); - } - } - return success(); -} - -/// Checks if all operations that have at least one attached region implement -/// the RegionBranchOpInterface. This is not required in edge cases, where we -/// have a single attached region and the parent operation has no results. -static bool validateSupportedControlFlow(Operation *op) { - WalkResult result = op->walk([&](Operation *operation) { - // Only check ops that are inside a function. - if (!operation->getParentOfType()) - return WalkResult::advance(); - - auto regions = operation->getRegions(); - // Walk over all operations in a region and check if the operation has at - // least one region and implements the RegionBranchOpInterface. If there - // is an operation that does not fulfill this condition, we cannot apply - // the deallocation steps. Furthermore, we accept cases, where we have a - // region that returns no results, since, in that case, the intra-region - // control flow does not affect the transformation. - size_t size = regions.size(); - if (((size == 1 && !operation->getResults().empty()) || size > 1) && - !dyn_cast(operation)) { - operation->emitError("All operations with attached regions need to " - "implement the RegionBranchOpInterface."); - } - - return WalkResult::advance(); - }); - return !result.wasSkipped(); -} - -namespace { - -//===----------------------------------------------------------------------===// -// Backedges analysis -//===----------------------------------------------------------------------===// - -/// A straight-forward program analysis which detects loop backedges induced by -/// explicit control flow. -class Backedges { -public: - using BlockSetT = SmallPtrSet; - using BackedgeSetT = llvm::DenseSet>; - -public: - /// Constructs a new backedges analysis using the op provided. - Backedges(Operation *op) { recurse(op); } - - /// Returns the number of backedges formed by explicit control flow. - size_t size() const { return edgeSet.size(); } - - /// Returns the start iterator to loop over all backedges. - BackedgeSetT::const_iterator begin() const { return edgeSet.begin(); } - - /// Returns the end iterator to loop over all backedges. - BackedgeSetT::const_iterator end() const { return edgeSet.end(); } - -private: - /// Enters the current block and inserts a backedge into the `edgeSet` if we - /// have already visited the current block. The inserted edge links the given - /// `predecessor` with the `current` block. - bool enter(Block ¤t, Block *predecessor) { - bool inserted = visited.insert(¤t).second; - if (!inserted) - edgeSet.insert(std::make_pair(predecessor, ¤t)); - return inserted; - } - - /// Leaves the current block. - void exit(Block ¤t) { visited.erase(¤t); } - - /// Recurses into the given operation while taking all attached regions into - /// account. - void recurse(Operation *op) { - Block *current = op->getBlock(); - // If the current op implements the `BranchOpInterface`, there can be - // cycles in the scope of all successor blocks. - if (isa(op)) { - for (Block *succ : current->getSuccessors()) - recurse(*succ, current); - } - // Recurse into all distinct regions and check for explicit control-flow - // loops. - for (Region ®ion : op->getRegions()) { - if (!region.empty()) - recurse(region.front(), current); - } - } - - /// Recurses into explicit control-flow structures that are given by - /// the successor relation defined on the block level. - void recurse(Block &block, Block *predecessor) { - // Try to enter the current block. If this is not possible, we are - // currently processing this block and can safely return here. - if (!enter(block, predecessor)) - return; - - // Recurse into all operations and successor blocks. - for (Operation &op : block.getOperations()) - recurse(&op); - - // Leave the current block. - exit(block); - } - - /// Stores all blocks that are currently visited and on the processing stack. - BlockSetT visited; - - /// Stores all backedges in the format (source, target). - BackedgeSetT edgeSet; -}; - -//===----------------------------------------------------------------------===// -// BufferDeallocation -//===----------------------------------------------------------------------===// - -/// The buffer deallocation transformation which ensures that all allocs in the -/// program have a corresponding de-allocation. As a side-effect, it might also -/// introduce clones that in turn leads to additional deallocations. -class BufferDeallocation : public BufferPlacementTransformationBase { -public: - using AliasAllocationMapT = - llvm::DenseMap; - - BufferDeallocation(Operation *op) - : BufferPlacementTransformationBase(op), dominators(op), - postDominators(op) {} - - /// Checks if all allocation operations either provide an already existing - /// deallocation operation or implement the AllocationOpInterface. In - /// addition, this method initializes the internal alias to - /// AllocationOpInterface mapping in order to get compatible - /// AllocationOpInterface implementations for aliases. - LogicalResult prepare() { - for (const BufferPlacementAllocs::AllocEntry &entry : allocs) { - // Get the defining allocation operation. - Value alloc = std::get<0>(entry); - auto allocationInterface = - alloc.getDefiningOp(); - // If there is no existing deallocation operation and no implementation of - // the AllocationOpInterface, we cannot apply the BufferDeallocation pass. - if (!std::get<1>(entry) && !allocationInterface) { - return alloc.getDefiningOp()->emitError( - "Allocation is not deallocated explicitly nor does the operation " - "implement the AllocationOpInterface."); - } - - // Register the current allocation interface implementation. - aliasToAllocations[alloc] = allocationInterface; - - // Get the alias information for the current allocation node. - for (Value alias : aliases.resolve(alloc)) { - // TODO: check for incompatible implementations of the - // AllocationOpInterface. This could be realized by promoting the - // AllocationOpInterface to a DialectInterface. - aliasToAllocations[alias] = allocationInterface; - } - } - return success(); - } - - /// Performs the actual placement/creation of all temporary clone and dealloc - /// nodes. - LogicalResult deallocate() { - // Add additional clones that are required. - if (failed(introduceClones())) - return failure(); - - // Place deallocations for all allocation entries. - return placeDeallocs(); - } - -private: - /// Introduces required clone operations to avoid memory leaks. - LogicalResult introduceClones() { - // Initialize the set of values that require a dedicated memory free - // operation since their operands cannot be safely deallocated in a post - // dominator. - SetVector valuesToFree; - llvm::SmallDenseSet> visitedValues; - SmallVector, 8> toProcess; - - // Check dominance relation for proper dominance properties. If the given - // value node does not dominate an alias, we will have to create a clone in - // order to free all buffers that can potentially leak into a post - // dominator. - auto findUnsafeValues = [&](Value source, Block *definingBlock) { - auto it = aliases.find(source); - if (it == aliases.end()) - return; - for (Value value : it->second) { - if (valuesToFree.count(value) > 0) - continue; - Block *parentBlock = value.getParentBlock(); - // Check whether we have to free this particular block argument or - // generic value. We have to free the current alias if it is either - // defined in a non-dominated block or it is defined in the same block - // but the current value is not dominated by the source value. - if (!dominators.dominates(definingBlock, parentBlock) || - (definingBlock == parentBlock && isa(value))) { - toProcess.emplace_back(value, parentBlock); - valuesToFree.insert(value); - } else if (visitedValues.insert(std::make_tuple(value, definingBlock)) - .second) - toProcess.emplace_back(value, definingBlock); - } - }; - - // Detect possibly unsafe aliases starting from all allocations. - for (BufferPlacementAllocs::AllocEntry &entry : allocs) { - Value allocValue = std::get<0>(entry); - findUnsafeValues(allocValue, allocValue.getDefiningOp()->getBlock()); - } - // Try to find block arguments that require an explicit free operation - // until we reach a fix point. - while (!toProcess.empty()) { - auto current = toProcess.pop_back_val(); - findUnsafeValues(std::get<0>(current), std::get<1>(current)); - } - - // Update buffer aliases to ensure that we free all buffers and block - // arguments at the correct locations. - aliases.remove(valuesToFree); - - // Add new allocs and additional clone operations. - for (Value value : valuesToFree) { - if (failed(isa(value) - ? introduceBlockArgCopy(cast(value)) - : introduceValueCopyForRegionResult(value))) - return failure(); - - // Register the value to require a final dealloc. Note that we do not have - // to assign a block here since we do not want to move the allocation node - // to another location. - allocs.registerAlloc(std::make_tuple(value, nullptr)); - } - return success(); - } - - /// Introduces temporary clones in all predecessors and copies the source - /// values into the newly allocated buffers. - LogicalResult introduceBlockArgCopy(BlockArgument blockArg) { - // Allocate a buffer for the current block argument in the block of - // the associated value (which will be a predecessor block by - // definition). - Block *block = blockArg.getOwner(); - for (auto it = block->pred_begin(), e = block->pred_end(); it != e; ++it) { - // Get the terminator and the value that will be passed to our - // argument. - Operation *terminator = (*it)->getTerminator(); - auto branchInterface = cast(terminator); - SuccessorOperands operands = - branchInterface.getSuccessorOperands(it.getSuccessorIndex()); - - // Query the associated source value. - Value sourceValue = operands[blockArg.getArgNumber()]; - if (!sourceValue) { - return failure(); - } - // Wire new clone and successor operand. - // Create a new clone at the current location of the terminator. - auto clone = introduceCloneBuffers(sourceValue, terminator); - if (failed(clone)) - return failure(); - operands.slice(blockArg.getArgNumber(), 1).assign(*clone); - } - - // Check whether the block argument has implicitly defined predecessors via - // the RegionBranchOpInterface. This can be the case if the current block - // argument belongs to the first block in a region and the parent operation - // implements the RegionBranchOpInterface. - Region *argRegion = block->getParent(); - Operation *parentOp = argRegion->getParentOp(); - RegionBranchOpInterface regionInterface; - if (&argRegion->front() != block || - !(regionInterface = dyn_cast(parentOp))) - return success(); - - if (failed(introduceClonesForRegionSuccessors( - regionInterface, argRegion->getParentOp()->getRegions(), blockArg, - [&](RegionSuccessor &successorRegion) { - // Find a predecessor of our argRegion. - return successorRegion.getSuccessor() == argRegion; - }))) - return failure(); - - // Check whether the block argument belongs to an entry region of the - // parent operation. In this case, we have to introduce an additional clone - // for buffer that is passed to the argument. - SmallVector successorRegions; - regionInterface.getSuccessorRegions(/*point=*/RegionBranchPoint::parent(), - successorRegions); - auto *it = - llvm::find_if(successorRegions, [&](RegionSuccessor &successorRegion) { - return successorRegion.getSuccessor() == argRegion; - }); - if (it == successorRegions.end()) - return success(); - - // Determine the actual operand to introduce a clone for and rewire the - // operand to point to the clone instead. - auto operands = regionInterface.getEntrySuccessorOperands(argRegion); - size_t operandIndex = - llvm::find(it->getSuccessorInputs(), blockArg).getIndex() + - operands.getBeginOperandIndex(); - Value operand = parentOp->getOperand(operandIndex); - assert(operand == - operands[operandIndex - operands.getBeginOperandIndex()] && - "region interface operands don't match parentOp operands"); - auto clone = introduceCloneBuffers(operand, parentOp); - if (failed(clone)) - return failure(); - - parentOp->setOperand(operandIndex, *clone); - return success(); - } - - /// Introduces temporary clones in front of all associated nested-region - /// terminators and copies the source values into the newly allocated buffers. - LogicalResult introduceValueCopyForRegionResult(Value value) { - // Get the actual result index in the scope of the parent terminator. - Operation *operation = value.getDefiningOp(); - auto regionInterface = cast(operation); - // Filter successors that return to the parent operation. - auto regionPredicate = [&](RegionSuccessor &successorRegion) { - // If the RegionSuccessor has no associated successor, it will return to - // its parent operation. - return !successorRegion.getSuccessor(); - }; - // Introduce a clone for all region "results" that are returned to the - // parent operation. This is required since the parent's result value has - // been considered critical. Therefore, the algorithm assumes that a clone - // of a previously allocated buffer is returned by the operation (like in - // the case of a block argument). - return introduceClonesForRegionSuccessors( - regionInterface, operation->getRegions(), value, regionPredicate); - } - - /// Introduces buffer clones for all terminators in the given regions. The - /// regionPredicate is applied to every successor region in order to restrict - /// the clones to specific regions. - template - LogicalResult introduceClonesForRegionSuccessors( - RegionBranchOpInterface regionInterface, MutableArrayRef regions, - Value argValue, const TPredicate ®ionPredicate) { - for (Region ®ion : regions) { - // Query the regionInterface to get all successor regions of the current - // one. - SmallVector successorRegions; - regionInterface.getSuccessorRegions(region, successorRegions); - // Try to find a matching region successor. - RegionSuccessor *regionSuccessor = - llvm::find_if(successorRegions, regionPredicate); - if (regionSuccessor == successorRegions.end()) - continue; - // Get the operand index in the context of the current successor input - // bindings. - size_t operandIndex = - llvm::find(regionSuccessor->getSuccessorInputs(), argValue) - .getIndex(); - - // Iterate over all immediate terminator operations to introduce - // new buffer allocations. Thereby, the appropriate terminator operand - // will be adjusted to point to the newly allocated buffer instead. - if (failed(walkReturnOperations( - ®ion, [&](RegionBranchTerminatorOpInterface terminator) { - // Get the actual mutable operands for this terminator op. - auto terminatorOperands = - terminator.getMutableSuccessorOperands(*regionSuccessor); - // Extract the source value from the current terminator. - // This conversion needs to exist on a separate line due to a - // bug in GCC conversion analysis. - OperandRange immutableTerminatorOperands = terminatorOperands; - Value sourceValue = immutableTerminatorOperands[operandIndex]; - // Create a new clone at the current location of the terminator. - auto clone = introduceCloneBuffers(sourceValue, terminator); - if (failed(clone)) - return failure(); - // Wire clone and terminator operand. - terminatorOperands.slice(operandIndex, 1).assign(*clone); - return success(); - }))) - return failure(); - } - return success(); - } - - /// Creates a new memory allocation for the given source value and clones - /// its content into the newly allocated buffer. The terminator operation is - /// used to insert the clone operation at the right place. - FailureOr introduceCloneBuffers(Value sourceValue, - Operation *terminator) { - // Avoid multiple clones of the same source value. This can happen in the - // presence of loops when a branch acts as a backedge while also having - // another successor that returns to its parent operation. Note: that - // copying copied buffers can introduce memory leaks since the invariant of - // BufferDeallocation assumes that a buffer will be only cloned once into a - // temporary buffer. Hence, the construction of clone chains introduces - // additional allocations that are not tracked automatically by the - // algorithm. - if (clonedValues.contains(sourceValue)) - return sourceValue; - // Create a new clone operation that copies the contents of the old - // buffer to the new one. - auto clone = buildClone(terminator, sourceValue); - if (succeeded(clone)) { - // Remember the clone of original source value. - clonedValues.insert(*clone); - } - return clone; - } - - /// Finds correct dealloc positions according to the algorithm described at - /// the top of the file for all alloc nodes and block arguments that can be - /// handled by this analysis. - LogicalResult placeDeallocs() { - // Move or insert deallocs using the previously computed information. - // These deallocations will be linked to their associated allocation nodes - // since they don't have any aliases that can (potentially) increase their - // liveness. - for (const BufferPlacementAllocs::AllocEntry &entry : allocs) { - Value alloc = std::get<0>(entry); - auto aliasesSet = aliases.resolve(alloc); - assert(!aliasesSet.empty() && "must contain at least one alias"); - - // Determine the actual block to place the dealloc and get liveness - // information. - Block *placementBlock = - findCommonDominator(alloc, aliasesSet, postDominators); - const LivenessBlockInfo *livenessInfo = - liveness.getLiveness(placementBlock); - - // We have to ensure that the dealloc will be after the last use of all - // aliases of the given value. We first assume that there are no uses in - // the placementBlock and that we can safely place the dealloc at the - // beginning. - Operation *endOperation = &placementBlock->front(); - - // Iterate over all aliases and ensure that the endOperation will point - // to the last operation of all potential aliases in the placementBlock. - for (Value alias : aliasesSet) { - // Ensure that the start operation is at least the defining operation of - // the current alias to avoid invalid placement of deallocs for aliases - // without any uses. - Operation *beforeOp = endOperation; - if (alias.getDefiningOp() && - !(beforeOp = placementBlock->findAncestorOpInBlock( - *alias.getDefiningOp()))) - continue; - - Operation *aliasEndOperation = - livenessInfo->getEndOperation(alias, beforeOp); - // Check whether the aliasEndOperation lies in the desired block and - // whether it is behind the current endOperation. If yes, this will be - // the new endOperation. - if (aliasEndOperation->getBlock() == placementBlock && - endOperation->isBeforeInBlock(aliasEndOperation)) - endOperation = aliasEndOperation; - } - // endOperation is the last operation behind which we can safely store - // the dealloc taking all potential aliases into account. - - // If there is an existing dealloc, move it to the right place. - Operation *deallocOperation = std::get<1>(entry); - if (deallocOperation) { - deallocOperation->moveAfter(endOperation); - } else { - // If the Dealloc position is at the terminator operation of the - // block, then the value should escape from a deallocation. - Operation *nextOp = endOperation->getNextNode(); - if (!nextOp) - continue; - // If there is no dealloc node, insert one in the right place. - if (failed(buildDealloc(nextOp, alloc))) - return failure(); - } - } - return success(); - } - - /// Builds a deallocation operation compatible with the given allocation - /// value. If there is no registered AllocationOpInterface implementation for - /// the given value (e.g. in the case of a function parameter), this method - /// builds a memref::DeallocOp. - LogicalResult buildDealloc(Operation *op, Value alloc) { - OpBuilder builder(op); - auto it = aliasToAllocations.find(alloc); - if (it != aliasToAllocations.end()) { - // Call the allocation op interface to build a supported and - // compatible deallocation operation. - auto dealloc = it->second.buildDealloc(builder, alloc); - if (!dealloc) - return op->emitError() - << "allocations without compatible deallocations are " - "not supported"; - } else { - // Build a "default" DeallocOp for unknown allocation sources. - builder.create(alloc.getLoc(), alloc); - } - return success(); - } - - /// Builds a clone operation compatible with the given allocation value. If - /// there is no registered AllocationOpInterface implementation for the given - /// value (e.g. in the case of a function parameter), this method builds a - /// bufferization::CloneOp. - FailureOr buildClone(Operation *op, Value alloc) { - OpBuilder builder(op); - auto it = aliasToAllocations.find(alloc); - if (it != aliasToAllocations.end()) { - // Call the allocation op interface to build a supported and - // compatible clone operation. - auto clone = it->second.buildClone(builder, alloc); - if (clone) - return *clone; - return (LogicalResult)(op->emitError() - << "allocations without compatible clone ops " - "are not supported"); - } - // Build a "default" CloneOp for unknown allocation sources. - return builder.create(alloc.getLoc(), alloc) - .getResult(); - } - - /// The dominator info to find the appropriate start operation to move the - /// allocs. - DominanceInfo dominators; - - /// The post dominator info to move the dependent allocs in the right - /// position. - PostDominanceInfo postDominators; - - /// Stores already cloned buffers to avoid additional clones of clones. - ValueSetT clonedValues; - - /// Maps aliases to their source allocation interfaces (inverse mapping). - AliasAllocationMapT aliasToAllocations; -}; - -//===----------------------------------------------------------------------===// -// BufferDeallocationPass -//===----------------------------------------------------------------------===// - -/// The actual buffer deallocation pass that inserts and moves dealloc nodes -/// into the right positions. Furthermore, it inserts additional clones if -/// necessary. It uses the algorithm described at the top of the file. -struct BufferDeallocationPass - : public bufferization::impl::BufferDeallocationBase< - BufferDeallocationPass> { - void getDependentDialects(DialectRegistry ®istry) const override { - registry.insert(); - registry.insert(); - } - - void runOnOperation() override { - func::FuncOp func = getOperation(); - if (func.isExternal()) - return; - - if (failed(deallocateBuffers(func))) - signalPassFailure(); - } -}; - -} // namespace - -LogicalResult bufferization::deallocateBuffers(Operation *op) { - if (isa(op)) { - WalkResult result = op->walk([&](func::FuncOp funcOp) { - if (failed(deallocateBuffers(funcOp))) - return WalkResult::interrupt(); - return WalkResult::advance(); - }); - return success(!result.wasInterrupted()); - } - - // Ensure that there are supported loops only. - Backedges backedges(op); - if (backedges.size()) { - op->emitError("Only structured control-flow loops are supported."); - return failure(); - } - - // Check that the control flow structures are supported. - if (!validateSupportedControlFlow(op)) - return failure(); - - // Gather all required allocation nodes and prepare the deallocation phase. - BufferDeallocation deallocation(op); - - // Check for supported AllocationOpInterface implementations and prepare the - // internal deallocation pass. - if (failed(deallocation.prepare())) - return failure(); - - // Place all required temporary clone and dealloc nodes. - if (failed(deallocation.deallocate())) - return failure(); - - return success(); -} - -//===----------------------------------------------------------------------===// -// BufferDeallocationPass construction -//===----------------------------------------------------------------------===// - -std::unique_ptr mlir::bufferization::createBufferDeallocationPass() { - return std::make_unique(); -} diff --git a/mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt b/mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt index 50104e8f8346b..7c38621be1bb5 100644 --- a/mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt +++ b/mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt @@ -1,6 +1,5 @@ add_mlir_dialect_library(MLIRBufferizationTransforms Bufferize.cpp - BufferDeallocation.cpp BufferDeallocationSimplification.cpp BufferOptimizations.cpp BufferResultsToOutParams.cpp diff --git a/mlir/test/Dialect/Bufferization/Transforms/buffer-deallocation.mlir b/mlir/test/Dialect/Bufferization/Transforms/buffer-deallocation.mlir deleted file mode 100644 index 3fbe3913c6549..0000000000000 --- a/mlir/test/Dialect/Bufferization/Transforms/buffer-deallocation.mlir +++ /dev/null @@ -1,1462 +0,0 @@ -// RUN: mlir-opt -verify-diagnostics -buffer-deallocation -split-input-file %s | FileCheck %s - -// This file checks the behaviour of BufferDeallocation pass for moving and -// inserting missing DeallocOps in their correct positions. Furthermore, -// copies and their corresponding AllocOps are inserted. - -// Test Case: -// bb0 -// / \ -// bb1 bb2 <- Initial position of AllocOp -// \ / -// bb3 -// BufferDeallocation expected behavior: bb2 contains an AllocOp which is -// passed to bb3. In the latter block, there should be an deallocation. -// Since bb1 does not contain an adequate alloc and the alloc in bb2 is not -// moved to bb0, we need to insert allocs and copies. - -// CHECK-LABEL: func @condBranch -func.func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - cf.br ^bb3(%arg1 : memref<2xf32>) -^bb2: - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.br ^bb3(%0 : memref<2xf32>) -^bb3(%1: memref<2xf32>): - test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: cf.cond_br -// CHECK: %[[ALLOC0:.*]] = bufferization.clone -// CHECK-NEXT: cf.br ^bb3(%[[ALLOC0]] -// CHECK: %[[ALLOC1:.*]] = memref.alloc -// CHECK-NEXT: test.buffer_based -// CHECK-NEXT: %[[ALLOC2:.*]] = bufferization.clone %[[ALLOC1]] -// CHECK-NEXT: memref.dealloc %[[ALLOC1]] -// CHECK-NEXT: cf.br ^bb3(%[[ALLOC2]] -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc -// CHECK-NEXT: return - -// ----- - -// Test Case: -// bb0 -// / \ -// bb1 bb2 <- Initial position of AllocOp -// \ / -// bb3 -// BufferDeallocation expected behavior: The existing AllocOp has a dynamic -// dependency to block argument %0 in bb2. Since the dynamic type is passed -// to bb3 via the block argument %2, it is currently required to allocate a -// temporary buffer for %2 that gets copies of %arg0 and %1 with their -// appropriate shape dimensions. The copy buffer deallocation will be applied -// to %2 in block bb3. - -// CHECK-LABEL: func @condBranchDynamicType -func.func @condBranchDynamicType( - %arg0: i1, - %arg1: memref, - %arg2: memref, - %arg3: index) { - cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index) -^bb1: - cf.br ^bb3(%arg1 : memref) -^bb2(%0: index): - %1 = memref.alloc(%0) : memref - test.buffer_based in(%arg1: memref) out(%1: memref) - cf.br ^bb3(%1 : memref) -^bb3(%2: memref): - test.copy(%2, %arg2) : (memref, memref) - return -} - -// CHECK-NEXT: cf.cond_br -// CHECK: %[[ALLOC0:.*]] = bufferization.clone -// CHECK-NEXT: cf.br ^bb3(%[[ALLOC0]] -// CHECK: ^bb2(%[[IDX:.*]]:{{.*}}) -// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc(%[[IDX]]) -// CHECK-NEXT: test.buffer_based -// CHECK-NEXT: %[[ALLOC2:.*]] = bufferization.clone -// CHECK-NEXT: memref.dealloc %[[ALLOC1]] -// CHECK-NEXT: cf.br ^bb3 -// CHECK-NEXT: ^bb3(%[[ALLOC3:.*]]:{{.*}}) -// CHECK: test.copy(%[[ALLOC3]], -// CHECK-NEXT: memref.dealloc %[[ALLOC3]] -// CHECK-NEXT: return - -// ----- - -// Test case: See above. - -// CHECK-LABEL: func @condBranchUnrankedType -func.func @condBranchUnrankedType( - %arg0: i1, - %arg1: memref<*xf32>, - %arg2: memref<*xf32>, - %arg3: index) { - cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index) -^bb1: - cf.br ^bb3(%arg1 : memref<*xf32>) -^bb2(%0: index): - %1 = memref.alloc(%0) : memref - %2 = memref.cast %1 : memref to memref<*xf32> - test.buffer_based in(%arg1: memref<*xf32>) out(%2: memref<*xf32>) - cf.br ^bb3(%2 : memref<*xf32>) -^bb3(%3: memref<*xf32>): - test.copy(%3, %arg2) : (memref<*xf32>, memref<*xf32>) - return -} - -// CHECK-NEXT: cf.cond_br -// CHECK: %[[ALLOC0:.*]] = bufferization.clone -// CHECK-NEXT: cf.br ^bb3(%[[ALLOC0]] -// CHECK: ^bb2(%[[IDX:.*]]:{{.*}}) -// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc(%[[IDX]]) -// CHECK: test.buffer_based -// CHECK-NEXT: %[[ALLOC2:.*]] = bufferization.clone -// CHECK-NEXT: memref.dealloc %[[ALLOC1]] -// CHECK-NEXT: cf.br ^bb3 -// CHECK-NEXT: ^bb3(%[[ALLOC3:.*]]:{{.*}}) -// CHECK: test.copy(%[[ALLOC3]], -// CHECK-NEXT: memref.dealloc %[[ALLOC3]] -// CHECK-NEXT: return - -// ----- - -// Test Case: -// bb0 -// / \ -// bb1 bb2 <- Initial position of AllocOp -// | / \ -// | bb3 bb4 -// | \ / -// \ bb5 -// \ / -// bb6 -// | -// bb7 -// BufferDeallocation expected behavior: The existing AllocOp has a dynamic -// dependency to block argument %0 in bb2. Since the dynamic type is passed to -// bb5 via the block argument %2 and to bb6 via block argument %3, it is -// currently required to allocate temporary buffers for %2 and %3 that gets -// copies of %1 and %arg0 1 with their appropriate shape dimensions. The copy -// buffer deallocations will be applied to %2 in block bb5 and to %3 in block -// bb6. Furthermore, there should be no copy inserted for %4. - -// CHECK-LABEL: func @condBranchDynamicTypeNested -func.func @condBranchDynamicTypeNested( - %arg0: i1, - %arg1: memref, - %arg2: memref, - %arg3: index) { - cf.cond_br %arg0, ^bb1, ^bb2(%arg3: index) -^bb1: - cf.br ^bb6(%arg1 : memref) -^bb2(%0: index): - %1 = memref.alloc(%0) : memref - test.buffer_based in(%arg1: memref) out(%1: memref) - cf.cond_br %arg0, ^bb3, ^bb4 -^bb3: - cf.br ^bb5(%1 : memref) -^bb4: - cf.br ^bb5(%1 : memref) -^bb5(%2: memref): - cf.br ^bb6(%2 : memref) -^bb6(%3: memref): - cf.br ^bb7(%3 : memref) -^bb7(%4: memref): - test.copy(%4, %arg2) : (memref, memref) - return -} - -// CHECK-NEXT: cf.cond_br{{.*}} -// CHECK-NEXT: ^bb1 -// CHECK-NEXT: %[[ALLOC0:.*]] = bufferization.clone -// CHECK-NEXT: cf.br ^bb6(%[[ALLOC0]] -// CHECK: ^bb2(%[[IDX:.*]]:{{.*}}) -// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc(%[[IDX]]) -// CHECK-NEXT: test.buffer_based -// CHECK: cf.cond_br -// CHECK: ^bb3: -// CHECK-NEXT: cf.br ^bb5(%[[ALLOC1]]{{.*}}) -// CHECK: ^bb4: -// CHECK-NEXT: cf.br ^bb5(%[[ALLOC1]]{{.*}}) -// CHECK-NEXT: ^bb5(%[[ALLOC2:.*]]:{{.*}}) -// CHECK-NEXT: %[[ALLOC3:.*]] = bufferization.clone %[[ALLOC2]] -// CHECK-NEXT: memref.dealloc %[[ALLOC1]] -// CHECK-NEXT: cf.br ^bb6(%[[ALLOC3]]{{.*}}) -// CHECK-NEXT: ^bb6(%[[ALLOC4:.*]]:{{.*}}) -// CHECK-NEXT: cf.br ^bb7(%[[ALLOC4]]{{.*}}) -// CHECK-NEXT: ^bb7(%[[ALLOC5:.*]]:{{.*}}) -// CHECK: test.copy(%[[ALLOC5]], -// CHECK-NEXT: memref.dealloc %[[ALLOC4]] -// CHECK-NEXT: return - -// ----- - -// Test Case: Existing AllocOp with no users. -// BufferDeallocation expected behavior: It should insert a DeallocOp right -// before ReturnOp. - -// CHECK-LABEL: func @emptyUsesValue -func.func @emptyUsesValue(%arg0: memref<4xf32>) { - %0 = memref.alloc() : memref<4xf32> - return -} -// CHECK-NEXT: %[[ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: memref.dealloc %[[ALLOC]] -// CHECK-NEXT: return - -// ----- - -// Test Case: -// bb0 -// / \ -// | bb1 <- Initial position of AllocOp -// \ / -// bb2 -// BufferDeallocation expected behavior: It should insert a DeallocOp at the -// exit block after CopyOp since %1 is an alias for %0 and %arg1. Furthermore, -// we have to insert a copy and an alloc in the beginning of the function. - -// CHECK-LABEL: func @criticalEdge -func.func @criticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - cf.cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>) -^bb1: - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.br ^bb2(%0 : memref<2xf32>) -^bb2(%1: memref<2xf32>): - test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[ALLOC0:.*]] = bufferization.clone -// CHECK-NEXT: cf.cond_br -// CHECK: %[[ALLOC1:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK-NEXT: %[[ALLOC2:.*]] = bufferization.clone %[[ALLOC1]] -// CHECK-NEXT: memref.dealloc %[[ALLOC1]] -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc -// CHECK-NEXT: return - -// ----- - -// Test Case: -// bb0 <- Initial position of AllocOp -// / \ -// | bb1 -// \ / -// bb2 -// BufferDeallocation expected behavior: It only inserts a DeallocOp at the -// exit block after CopyOp since %1 is an alias for %0 and %arg1. - -// CHECK-LABEL: func @invCriticalEdge -func.func @invCriticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>) -^bb1: - cf.br ^bb2(%0 : memref<2xf32>) -^bb2(%1: memref<2xf32>): - test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK: dealloc -// CHECK-NEXT: return - -// ----- - -// Test Case: -// bb0 <- Initial position of the first AllocOp -// / \ -// bb1 bb2 -// \ / -// bb3 <- Initial position of the second AllocOp -// BufferDeallocation expected behavior: It only inserts two missing -// DeallocOps in the exit block. %5 is an alias for %0. Therefore, the -// DeallocOp for %0 should occur after the last BufferBasedOp. The Dealloc for -// %7 should happen after CopyOp. - -// CHECK-LABEL: func @ifElse -func.func @ifElse(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.cond_br %arg0, - ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), - ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>) -^bb1(%1: memref<2xf32>, %2: memref<2xf32>): - cf.br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>) -^bb2(%3: memref<2xf32>, %4: memref<2xf32>): - cf.br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>) -^bb3(%5: memref<2xf32>, %6: memref<2xf32>): - %7 = memref.alloc() : memref<2xf32> - test.buffer_based in(%5: memref<2xf32>) out(%7: memref<2xf32>) - test.copy(%7, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK: %[[SECOND_ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK: memref.dealloc %[[FIRST_ALLOC]] -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc %[[SECOND_ALLOC]] -// CHECK-NEXT: return - -// ----- - -// Test Case: No users for buffer in if-else CFG -// bb0 <- Initial position of AllocOp -// / \ -// bb1 bb2 -// \ / -// bb3 -// BufferDeallocation expected behavior: It only inserts a missing DeallocOp -// in the exit block since %5 or %6 are the latest aliases of %0. - -// CHECK-LABEL: func @ifElseNoUsers -func.func @ifElseNoUsers(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.cond_br %arg0, - ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), - ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>) -^bb1(%1: memref<2xf32>, %2: memref<2xf32>): - cf.br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>) -^bb2(%3: memref<2xf32>, %4: memref<2xf32>): - cf.br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>) -^bb3(%5: memref<2xf32>, %6: memref<2xf32>): - test.copy(%arg1, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = memref.alloc() -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc %[[FIRST_ALLOC]] -// CHECK-NEXT: return - -// ----- - -// Test Case: -// bb0 <- Initial position of the first AllocOp -// / \ -// bb1 bb2 -// | / \ -// | bb3 bb4 -// \ \ / -// \ / -// bb5 <- Initial position of the second AllocOp -// BufferDeallocation expected behavior: Two missing DeallocOps should be -// inserted in the exit block. - -// CHECK-LABEL: func @ifElseNested -func.func @ifElseNested(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.cond_br %arg0, - ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), - ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>) -^bb1(%1: memref<2xf32>, %2: memref<2xf32>): - cf.br ^bb5(%1, %2 : memref<2xf32>, memref<2xf32>) -^bb2(%3: memref<2xf32>, %4: memref<2xf32>): - cf.cond_br %arg0, ^bb3(%3 : memref<2xf32>), ^bb4(%4 : memref<2xf32>) -^bb3(%5: memref<2xf32>): - cf.br ^bb5(%5, %3 : memref<2xf32>, memref<2xf32>) -^bb4(%6: memref<2xf32>): - cf.br ^bb5(%3, %6 : memref<2xf32>, memref<2xf32>) -^bb5(%7: memref<2xf32>, %8: memref<2xf32>): - %9 = memref.alloc() : memref<2xf32> - test.buffer_based in(%7: memref<2xf32>) out(%9: memref<2xf32>) - test.copy(%9, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK: %[[SECOND_ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK: memref.dealloc %[[FIRST_ALLOC]] -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc %[[SECOND_ALLOC]] -// CHECK-NEXT: return - -// ----- - -// Test Case: Dead operations in a single block. -// BufferDeallocation expected behavior: It only inserts the two missing -// DeallocOps after the last BufferBasedOp. - -// CHECK-LABEL: func @redundantOperations -func.func @redundantOperations(%arg0: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg0: memref<2xf32>) out(%0: memref<2xf32>) - %1 = memref.alloc() : memref<2xf32> - test.buffer_based in(%0: memref<2xf32>) out(%1: memref<2xf32>) - return -} - -// CHECK: (%[[ARG0:.*]]: {{.*}}) -// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based in(%[[ARG0]]{{.*}}out(%[[FIRST_ALLOC]] -// CHECK: %[[SECOND_ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based in(%[[FIRST_ALLOC]]{{.*}}out(%[[SECOND_ALLOC]] -// CHECK: dealloc -// CHECK-NEXT: dealloc -// CHECK-NEXT: return - -// ----- - -// Test Case: -// bb0 -// / \ -// Initial pos of the 1st AllocOp -> bb1 bb2 <- Initial pos of the 2nd AllocOp -// \ / -// bb3 -// BufferDeallocation expected behavior: We need to introduce a copy for each -// buffer since the buffers are passed to bb3. The both missing DeallocOps are -// inserted in the respective block of the allocs. The copy is freed in the exit -// block. - -// CHECK-LABEL: func @moving_alloc_and_inserting_missing_dealloc -func.func @moving_alloc_and_inserting_missing_dealloc( - %cond: i1, - %arg0: memref<2xf32>, - %arg1: memref<2xf32>) { - cf.cond_br %cond, ^bb1, ^bb2 -^bb1: - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg0: memref<2xf32>) out(%0: memref<2xf32>) - cf.br ^exit(%0 : memref<2xf32>) -^bb2: - %1 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg0: memref<2xf32>) out(%1: memref<2xf32>) - cf.br ^exit(%1 : memref<2xf32>) -^exit(%arg2: memref<2xf32>): - test.copy(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: cf.cond_br{{.*}} -// CHECK-NEXT: ^bb1 -// CHECK: %[[ALLOC0:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK-NEXT: %[[ALLOC1:.*]] = bufferization.clone %[[ALLOC0]] -// CHECK-NEXT: memref.dealloc %[[ALLOC0]] -// CHECK-NEXT: cf.br ^bb3(%[[ALLOC1]] -// CHECK-NEXT: ^bb2 -// CHECK-NEXT: %[[ALLOC2:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK-NEXT: %[[ALLOC3:.*]] = bufferization.clone %[[ALLOC2]] -// CHECK-NEXT: memref.dealloc %[[ALLOC2]] -// CHECK-NEXT: cf.br ^bb3(%[[ALLOC3]] -// CHECK-NEXT: ^bb3(%[[ALLOC4:.*]]:{{.*}}) -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc %[[ALLOC4]] -// CHECK-NEXT: return - -// ----- - -// Test Case: Invalid position of the DeallocOp. There is a user after -// deallocation. -// bb0 -// / \ -// bb1 bb2 <- Initial position of AllocOp -// \ / -// bb3 -// BufferDeallocation expected behavior: The existing DeallocOp should be -// moved to exit block. - -// CHECK-LABEL: func @moving_invalid_dealloc_op_complex -func.func @moving_invalid_dealloc_op_complex( - %cond: i1, - %arg0: memref<2xf32>, - %arg1: memref<2xf32>) { - %1 = memref.alloc() : memref<2xf32> - cf.cond_br %cond, ^bb1, ^bb2 -^bb1: - cf.br ^exit(%arg0 : memref<2xf32>) -^bb2: - test.buffer_based in(%arg0: memref<2xf32>) out(%1: memref<2xf32>) - memref.dealloc %1 : memref<2xf32> - cf.br ^exit(%1 : memref<2xf32>) -^exit(%arg2: memref<2xf32>): - test.copy(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc() -// CHECK-NEXT: cf.cond_br -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc %[[ALLOC0]] -// CHECK-NEXT: return - -// ----- - -// Test Case: Inserting missing DeallocOp in a single block. - -// CHECK-LABEL: func @inserting_missing_dealloc_simple -func.func @inserting_missing_dealloc_simple( - %arg0 : memref<2xf32>, - %arg1: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg0: memref<2xf32>) out(%0: memref<2xf32>) - test.copy(%0, %arg1) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc() -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc %[[ALLOC0]] - -// ----- - -// Test Case: Moving invalid DeallocOp (there is a user after deallocation) in a -// single block. - -// CHECK-LABEL: func @moving_invalid_dealloc_op -func.func @moving_invalid_dealloc_op(%arg0 : memref<2xf32>, %arg1: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg0: memref<2xf32>) out(%0: memref<2xf32>) - memref.dealloc %0 : memref<2xf32> - test.copy(%0, %arg1) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc() -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc %[[ALLOC0]] - -// ----- - -// Test Case: Nested regions - This test defines a BufferBasedOp inside the -// region of a RegionBufferBasedOp. -// BufferDeallocation expected behavior: The AllocOp for the BufferBasedOp -// should remain inside the region of the RegionBufferBasedOp and it should insert -// the missing DeallocOp in the same region. The missing DeallocOp should be -// inserted after CopyOp. - -// CHECK-LABEL: func @nested_regions_and_cond_branch -func.func @nested_regions_and_cond_branch( - %arg0: i1, - %arg1: memref<2xf32>, - %arg2: memref<2xf32>) { - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - cf.br ^bb3(%arg1 : memref<2xf32>) -^bb2: - %0 = memref.alloc() : memref<2xf32> - test.region_buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) { - ^bb0(%gen1_arg0: f32, %gen1_arg1: f32): - %1 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%1: memref<2xf32>) - %tmp1 = math.exp %gen1_arg0 : f32 - test.region_yield %tmp1 : f32 - } - cf.br ^bb3(%0 : memref<2xf32>) -^bb3(%1: memref<2xf32>): - test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} -// CHECK: (%[[cond:.*]]: {{.*}}, %[[ARG1:.*]]: {{.*}}, %{{.*}}: {{.*}}) -// CHECK-NEXT: cf.cond_br %[[cond]], ^[[BB1:.*]], ^[[BB2:.*]] -// CHECK: %[[ALLOC0:.*]] = bufferization.clone %[[ARG1]] -// CHECK: ^[[BB2]]: -// CHECK: %[[ALLOC1:.*]] = memref.alloc() -// CHECK-NEXT: test.region_buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC1]] -// CHECK: %[[ALLOC2:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC2]] -// CHECK: memref.dealloc %[[ALLOC2]] -// CHECK-NEXT: %{{.*}} = math.exp -// CHECK: %[[ALLOC3:.*]] = bufferization.clone %[[ALLOC1]] -// CHECK-NEXT: memref.dealloc %[[ALLOC1]] -// CHECK: ^[[BB3:.*]]({{.*}}): -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc - -// ----- - -// Test Case: buffer deallocation escaping -// BufferDeallocation expected behavior: It must not dealloc %arg1 and %x -// since they are operands of return operation and should escape from -// deallocating. It should dealloc %y after CopyOp. - -// CHECK-LABEL: func @memref_in_function_results -func.func @memref_in_function_results( - %arg0: memref<5xf32>, - %arg1: memref<10xf32>, - %arg2: memref<5xf32>) -> (memref<10xf32>, memref<15xf32>) { - %x = memref.alloc() : memref<15xf32> - %y = memref.alloc() : memref<5xf32> - test.buffer_based in(%arg0: memref<5xf32>) out(%y: memref<5xf32>) - test.copy(%y, %arg2) : (memref<5xf32>, memref<5xf32>) - return %arg1, %x : memref<10xf32>, memref<15xf32> -} -// CHECK: (%[[ARG0:.*]]: memref<5xf32>, %[[ARG1:.*]]: memref<10xf32>, -// CHECK-SAME: %[[RESULT:.*]]: memref<5xf32>) -// CHECK: %[[X:.*]] = memref.alloc() -// CHECK: %[[Y:.*]] = memref.alloc() -// CHECK: test.copy -// CHECK: memref.dealloc %[[Y]] -// CHECK: return %[[ARG1]], %[[X]] - -// ----- - -// Test Case: nested region control flow -// The alloc %1 flows through both if branches until it is finally returned. -// Hence, it does not require a specific dealloc operation. However, %3 -// requires a dealloc. - -// CHECK-LABEL: func @nested_region_control_flow -func.func @nested_region_control_flow( - %arg0 : index, - %arg1 : index) -> memref { - %0 = arith.cmpi eq, %arg0, %arg1 : index - %1 = memref.alloc(%arg0, %arg0) : memref - %2 = scf.if %0 -> (memref) { - scf.yield %1 : memref - } else { - %3 = memref.alloc(%arg0, %arg1) : memref - scf.yield %1 : memref - } - return %2 : memref -} - -// CHECK: %[[ALLOC0:.*]] = memref.alloc(%arg0, %arg0) -// CHECK-NEXT: %[[ALLOC1:.*]] = scf.if -// CHECK: scf.yield %[[ALLOC0]] -// CHECK: %[[ALLOC2:.*]] = memref.alloc(%arg0, %arg1) -// CHECK-NEXT: memref.dealloc %[[ALLOC2]] -// CHECK-NEXT: scf.yield %[[ALLOC0]] -// CHECK: return %[[ALLOC1]] - -// ----- - -// Test Case: nested region control flow with a nested buffer allocation in a -// divergent branch. -// Buffer deallocation places a copy for both %1 and %3, since they are -// returned in the end. - -// CHECK-LABEL: func @nested_region_control_flow_div -func.func @nested_region_control_flow_div( - %arg0 : index, - %arg1 : index) -> memref { - %0 = arith.cmpi eq, %arg0, %arg1 : index - %1 = memref.alloc(%arg0, %arg0) : memref - %2 = scf.if %0 -> (memref) { - scf.yield %1 : memref - } else { - %3 = memref.alloc(%arg0, %arg1) : memref - scf.yield %3 : memref - } - return %2 : memref -} - -// CHECK: %[[ALLOC0:.*]] = memref.alloc(%arg0, %arg0) -// CHECK-NEXT: %[[ALLOC1:.*]] = scf.if -// CHECK-NEXT: %[[ALLOC2:.*]] = bufferization.clone %[[ALLOC0]] -// CHECK: scf.yield %[[ALLOC2]] -// CHECK: %[[ALLOC3:.*]] = memref.alloc(%arg0, %arg1) -// CHECK-NEXT: %[[ALLOC4:.*]] = bufferization.clone %[[ALLOC3]] -// CHECK: memref.dealloc %[[ALLOC3]] -// CHECK: scf.yield %[[ALLOC4]] -// CHECK: memref.dealloc %[[ALLOC0]] -// CHECK-NEXT: return %[[ALLOC1]] - -// ----- - -// Test Case: nested region control flow within a region interface. -// No copies are required in this case since the allocation finally escapes -// the method. - -// CHECK-LABEL: func @inner_region_control_flow -func.func @inner_region_control_flow(%arg0 : index) -> memref { - %0 = memref.alloc(%arg0, %arg0) : memref - %1 = test.region_if %0 : memref -> (memref) then { - ^bb0(%arg1 : memref): - test.region_if_yield %arg1 : memref - } else { - ^bb0(%arg1 : memref): - test.region_if_yield %arg1 : memref - } join { - ^bb0(%arg1 : memref): - test.region_if_yield %arg1 : memref - } - return %1 : memref -} - -// CHECK: %[[ALLOC0:.*]] = memref.alloc(%arg0, %arg0) -// CHECK-NEXT: %[[ALLOC1:.*]] = test.region_if -// CHECK-NEXT: ^bb0(%[[ALLOC2:.*]]:{{.*}}): -// CHECK-NEXT: test.region_if_yield %[[ALLOC2]] -// CHECK: ^bb0(%[[ALLOC3:.*]]:{{.*}}): -// CHECK-NEXT: test.region_if_yield %[[ALLOC3]] -// CHECK: ^bb0(%[[ALLOC4:.*]]:{{.*}}): -// CHECK-NEXT: test.region_if_yield %[[ALLOC4]] -// CHECK: return %[[ALLOC1]] - -// ----- - -// CHECK-LABEL: func @subview -func.func @subview(%arg0 : index, %arg1 : index, %arg2 : memref) { - %0 = memref.alloc() : memref<64x4xf32, strided<[4, 1], offset: 0>> - %1 = memref.subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] : - memref<64x4xf32, strided<[4, 1], offset: 0>> - to memref> - test.copy(%1, %arg2) : - (memref>, memref) - return -} - -// CHECK-NEXT: %[[ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: memref.subview -// CHECK-NEXT: test.copy -// CHECK-NEXT: memref.dealloc %[[ALLOC]] -// CHECK-NEXT: return - -// ----- - -// Test Case: In the presence of AllocaOps only the AllocOps has top be freed. -// Therefore, all allocas are not handled. - -// CHECK-LABEL: func @condBranchAlloca -func.func @condBranchAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - cf.br ^bb3(%arg1 : memref<2xf32>) -^bb2: - %0 = memref.alloca() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.br ^bb3(%0 : memref<2xf32>) -^bb3(%1: memref<2xf32>): - test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: cf.cond_br -// CHECK: %[[ALLOCA:.*]] = memref.alloca() -// CHECK: cf.br ^bb3(%[[ALLOCA:.*]]) -// CHECK-NEXT: ^bb3 -// CHECK-NEXT: test.copy -// CHECK-NEXT: return - -// ----- - -// Test Case: In the presence of AllocaOps only the AllocOps has top be freed. -// Therefore, all allocas are not handled. In this case, only alloc %0 has a -// dealloc. - -// CHECK-LABEL: func @ifElseAlloca -func.func @ifElseAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.cond_br %arg0, - ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), - ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>) -^bb1(%1: memref<2xf32>, %2: memref<2xf32>): - cf.br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>) -^bb2(%3: memref<2xf32>, %4: memref<2xf32>): - cf.br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>) -^bb3(%5: memref<2xf32>, %6: memref<2xf32>): - %7 = memref.alloca() : memref<2xf32> - test.buffer_based in(%5: memref<2xf32>) out(%7: memref<2xf32>) - test.copy(%7, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK: %[[ALLOCA:.*]] = memref.alloca() -// CHECK-NEXT: test.buffer_based -// CHECK: memref.dealloc %[[ALLOC]] -// CHECK: test.copy -// CHECK-NEXT: return - -// ----- - -// CHECK-LABEL: func @ifElseNestedAlloca -func.func @ifElseNestedAlloca( - %arg0: i1, - %arg1: memref<2xf32>, - %arg2: memref<2xf32>) { - %0 = memref.alloca() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) - cf.cond_br %arg0, - ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), - ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>) -^bb1(%1: memref<2xf32>, %2: memref<2xf32>): - cf.br ^bb5(%1, %2 : memref<2xf32>, memref<2xf32>) -^bb2(%3: memref<2xf32>, %4: memref<2xf32>): - cf.cond_br %arg0, ^bb3(%3 : memref<2xf32>), ^bb4(%4 : memref<2xf32>) -^bb3(%5: memref<2xf32>): - cf.br ^bb5(%5, %3 : memref<2xf32>, memref<2xf32>) -^bb4(%6: memref<2xf32>): - cf.br ^bb5(%3, %6 : memref<2xf32>, memref<2xf32>) -^bb5(%7: memref<2xf32>, %8: memref<2xf32>): - %9 = memref.alloc() : memref<2xf32> - test.buffer_based in(%7: memref<2xf32>) out(%9: memref<2xf32>) - test.copy(%9, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-NEXT: %[[ALLOCA:.*]] = memref.alloca() -// CHECK-NEXT: test.buffer_based -// CHECK: %[[ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: test.buffer_based -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc %[[ALLOC]] -// CHECK-NEXT: return - -// ----- - -// CHECK-LABEL: func @nestedRegionsAndCondBranchAlloca -func.func @nestedRegionsAndCondBranchAlloca( - %arg0: i1, - %arg1: memref<2xf32>, - %arg2: memref<2xf32>) { - cf.cond_br %arg0, ^bb1, ^bb2 -^bb1: - cf.br ^bb3(%arg1 : memref<2xf32>) -^bb2: - %0 = memref.alloc() : memref<2xf32> - test.region_buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>) { - ^bb0(%gen1_arg0: f32, %gen1_arg1: f32): - %1 = memref.alloca() : memref<2xf32> - test.buffer_based in(%arg1: memref<2xf32>) out(%1: memref<2xf32>) - %tmp1 = math.exp %gen1_arg0 : f32 - test.region_yield %tmp1 : f32 - } - cf.br ^bb3(%0 : memref<2xf32>) -^bb3(%1: memref<2xf32>): - test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) - return -} -// CHECK: (%[[cond:.*]]: {{.*}}, %[[ARG1:.*]]: {{.*}}, %{{.*}}: {{.*}}) -// CHECK-NEXT: cf.cond_br %[[cond]], ^[[BB1:.*]], ^[[BB2:.*]] -// CHECK: ^[[BB1]]: -// CHECK: %[[ALLOC0:.*]] = bufferization.clone -// CHECK: ^[[BB2]]: -// CHECK: %[[ALLOC1:.*]] = memref.alloc() -// CHECK-NEXT: test.region_buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC1]] -// CHECK: %[[ALLOCA:.*]] = memref.alloca() -// CHECK-NEXT: test.buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOCA]] -// CHECK: %{{.*}} = math.exp -// CHECK: %[[ALLOC2:.*]] = bufferization.clone %[[ALLOC1]] -// CHECK-NEXT: memref.dealloc %[[ALLOC1]] -// CHECK: ^[[BB3:.*]]({{.*}}): -// CHECK: test.copy -// CHECK-NEXT: memref.dealloc - -// ----- - -// CHECK-LABEL: func @nestedRegionControlFlowAlloca -func.func @nestedRegionControlFlowAlloca( - %arg0 : index, - %arg1 : index) -> memref { - %0 = arith.cmpi eq, %arg0, %arg1 : index - %1 = memref.alloc(%arg0, %arg0) : memref - %2 = scf.if %0 -> (memref) { - scf.yield %1 : memref - } else { - %3 = memref.alloca(%arg0, %arg1) : memref - scf.yield %1 : memref - } - return %2 : memref -} - -// CHECK: %[[ALLOC0:.*]] = memref.alloc(%arg0, %arg0) -// CHECK-NEXT: %[[ALLOC1:.*]] = scf.if -// CHECK: scf.yield %[[ALLOC0]] -// CHECK: %[[ALLOCA:.*]] = memref.alloca(%arg0, %arg1) -// CHECK-NEXT: scf.yield %[[ALLOC0]] -// CHECK: return %[[ALLOC1]] - -// ----- - -// Test Case: structured control-flow loop using a nested alloc. -// The iteration argument %iterBuf has to be freed before yielding %3 to avoid -// memory leaks. - -// CHECK-LABEL: func @loop_alloc -func.func @loop_alloc( - %lb: index, - %ub: index, - %step: index, - %buf: memref<2xf32>, - %res: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - %1 = scf.for %i = %lb to %ub step %step - iter_args(%iterBuf = %buf) -> memref<2xf32> { - %2 = arith.cmpi eq, %i, %ub : index - %3 = memref.alloc() : memref<2xf32> - scf.yield %3 : memref<2xf32> - } - test.copy(%1, %res) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK: %[[ALLOC0:.*]] = memref.alloc() -// CHECK-NEXT: memref.dealloc %[[ALLOC0]] -// CHECK-NEXT: %[[ALLOC1:.*]] = bufferization.clone %arg3 -// CHECK: %[[ALLOC2:.*]] = scf.for {{.*}} iter_args -// CHECK-SAME: (%[[IALLOC:.*]] = %[[ALLOC1]] -// CHECK: arith.cmpi -// CHECK: memref.dealloc %[[IALLOC]] -// CHECK: %[[ALLOC3:.*]] = memref.alloc() -// CHECK: %[[ALLOC4:.*]] = bufferization.clone %[[ALLOC3]] -// CHECK: memref.dealloc %[[ALLOC3]] -// CHECK: scf.yield %[[ALLOC4]] -// CHECK: } -// CHECK: test.copy(%[[ALLOC2]], %arg4) -// CHECK-NEXT: memref.dealloc %[[ALLOC2]] - -// ----- - -// Test Case: structured control-flow loop with a nested if operation. -// The loop yields buffers that have been defined outside of the loop and the -// backedges only use the iteration arguments (or one of its aliases). -// Therefore, we do not have to (and are not allowed to) free any buffers -// that are passed via the backedges. - -// CHECK-LABEL: func @loop_nested_if_no_alloc -func.func @loop_nested_if_no_alloc( - %lb: index, - %ub: index, - %step: index, - %buf: memref<2xf32>, - %res: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - %1 = scf.for %i = %lb to %ub step %step - iter_args(%iterBuf = %buf) -> memref<2xf32> { - %2 = arith.cmpi eq, %i, %ub : index - %3 = scf.if %2 -> (memref<2xf32>) { - scf.yield %0 : memref<2xf32> - } else { - scf.yield %iterBuf : memref<2xf32> - } - scf.yield %3 : memref<2xf32> - } - test.copy(%1, %res) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK: %[[ALLOC0:.*]] = memref.alloc() -// CHECK-NEXT: %[[ALLOC1:.*]] = scf.for {{.*}} iter_args(%[[IALLOC:.*]] = -// CHECK: %[[ALLOC2:.*]] = scf.if -// CHECK: scf.yield %[[ALLOC0]] -// CHECK: scf.yield %[[IALLOC]] -// CHECK: scf.yield %[[ALLOC2]] -// CHECK: test.copy(%[[ALLOC1]], %arg4) -// CHECK: memref.dealloc %[[ALLOC0]] - -// ----- - -// Test Case: structured control-flow loop with a nested if operation using -// a deeply nested buffer allocation. -// Since the innermost allocation happens in a divergent branch, we have to -// introduce additional copies for the nested if operation. Since the loop's -// yield operation "returns" %3, it will return a newly allocated buffer. -// Therefore, we have to free the iteration argument %iterBuf before -// "returning" %3. - -// CHECK-LABEL: func @loop_nested_if_alloc -func.func @loop_nested_if_alloc( - %lb: index, - %ub: index, - %step: index, - %buf: memref<2xf32>) -> memref<2xf32> { - %0 = memref.alloc() : memref<2xf32> - %1 = scf.for %i = %lb to %ub step %step - iter_args(%iterBuf = %buf) -> memref<2xf32> { - %2 = arith.cmpi eq, %i, %ub : index - %3 = scf.if %2 -> (memref<2xf32>) { - %4 = memref.alloc() : memref<2xf32> - scf.yield %4 : memref<2xf32> - } else { - scf.yield %0 : memref<2xf32> - } - scf.yield %3 : memref<2xf32> - } - return %1 : memref<2xf32> -} - -// CHECK: %[[ALLOC0:.*]] = memref.alloc() -// CHECK-NEXT: %[[ALLOC1:.*]] = bufferization.clone %arg3 -// CHECK-NEXT: %[[ALLOC2:.*]] = scf.for {{.*}} iter_args -// CHECK-SAME: (%[[IALLOC:.*]] = %[[ALLOC1]] -// CHECK: memref.dealloc %[[IALLOC]] -// CHECK: %[[ALLOC3:.*]] = scf.if - -// CHECK: %[[ALLOC4:.*]] = memref.alloc() -// CHECK-NEXT: %[[ALLOC5:.*]] = bufferization.clone %[[ALLOC4]] -// CHECK-NEXT: memref.dealloc %[[ALLOC4]] -// CHECK-NEXT: scf.yield %[[ALLOC5]] - -// CHECK: %[[ALLOC6:.*]] = bufferization.clone %[[ALLOC0]] -// CHECK-NEXT: scf.yield %[[ALLOC6]] - -// CHECK: %[[ALLOC7:.*]] = bufferization.clone %[[ALLOC3]] -// CHECK-NEXT: memref.dealloc %[[ALLOC3]] -// CHECK-NEXT: scf.yield %[[ALLOC7]] - -// CHECK: memref.dealloc %[[ALLOC0]] -// CHECK-NEXT: return %[[ALLOC2]] - -// ----- - -// Test Case: several nested structured control-flow loops with a deeply nested -// buffer allocation inside an if operation. -// Same behavior is an loop_nested_if_alloc: we have to insert deallocations -// before each yield in all loops recursively. - -// CHECK-LABEL: func @loop_nested_alloc -func.func @loop_nested_alloc( - %lb: index, - %ub: index, - %step: index, - %buf: memref<2xf32>, - %res: memref<2xf32>) { - %0 = memref.alloc() : memref<2xf32> - %1 = scf.for %i = %lb to %ub step %step - iter_args(%iterBuf = %buf) -> memref<2xf32> { - %2 = scf.for %i2 = %lb to %ub step %step - iter_args(%iterBuf2 = %iterBuf) -> memref<2xf32> { - %3 = scf.for %i3 = %lb to %ub step %step - iter_args(%iterBuf3 = %iterBuf2) -> memref<2xf32> { - %4 = memref.alloc() : memref<2xf32> - %5 = arith.cmpi eq, %i, %ub : index - %6 = scf.if %5 -> (memref<2xf32>) { - %7 = memref.alloc() : memref<2xf32> - scf.yield %7 : memref<2xf32> - } else { - scf.yield %iterBuf3 : memref<2xf32> - } - scf.yield %6 : memref<2xf32> - } - scf.yield %3 : memref<2xf32> - } - scf.yield %2 : memref<2xf32> - } - test.copy(%1, %res) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK: %[[ALLOC0:.*]] = memref.alloc() -// CHECK-NEXT: memref.dealloc %[[ALLOC0]] -// CHECK-NEXT: %[[ALLOC1:.*]] = bufferization.clone %arg3 -// CHECK-NEXT: %[[VAL_7:.*]] = scf.for {{.*}} iter_args -// CHECK-SAME: (%[[IALLOC0:.*]] = %[[ALLOC1]]) -// CHECK-NEXT: %[[ALLOC2:.*]] = bufferization.clone %[[IALLOC0]] -// CHECK-NEXT: memref.dealloc %[[IALLOC0]] -// CHECK-NEXT: %[[ALLOC3:.*]] = scf.for {{.*}} iter_args -// CHECK-SAME: (%[[IALLOC1:.*]] = %[[ALLOC2]]) -// CHECK-NEXT: %[[ALLOC5:.*]] = bufferization.clone %[[IALLOC1]] -// CHECK-NEXT: memref.dealloc %[[IALLOC1]] - -// CHECK: %[[ALLOC6:.*]] = scf.for {{.*}} iter_args -// CHECK-SAME: (%[[IALLOC2:.*]] = %[[ALLOC5]]) -// CHECK: %[[ALLOC8:.*]] = memref.alloc() -// CHECK-NEXT: memref.dealloc %[[ALLOC8]] -// CHECK: %[[ALLOC9:.*]] = scf.if - -// CHECK: %[[ALLOC11:.*]] = memref.alloc() -// CHECK-NEXT: %[[ALLOC12:.*]] = bufferization.clone %[[ALLOC11]] -// CHECK-NEXT: memref.dealloc %[[ALLOC11]] -// CHECK-NEXT: scf.yield %[[ALLOC12]] - -// CHECK: %[[ALLOC13:.*]] = bufferization.clone %[[IALLOC2]] -// CHECK-NEXT: scf.yield %[[ALLOC13]] - -// CHECK: memref.dealloc %[[IALLOC2]] -// CHECK-NEXT: %[[ALLOC10:.*]] = bufferization.clone %[[ALLOC9]] -// CHECK-NEXT: memref.dealloc %[[ALLOC9]] -// CHECK-NEXT: scf.yield %[[ALLOC10]] - -// CHECK: %[[ALLOC7:.*]] = bufferization.clone %[[ALLOC6]] -// CHECK-NEXT: memref.dealloc %[[ALLOC6]] -// CHECK-NEXT: scf.yield %[[ALLOC7]] - -// CHECK: %[[ALLOC4:.*]] = bufferization.clone %[[ALLOC3]] -// CHECK-NEXT: memref.dealloc %[[ALLOC3]] -// CHECK-NEXT: scf.yield %[[ALLOC4]] - -// CHECK: test.copy(%[[VAL_7]], %arg4) -// CHECK-NEXT: memref.dealloc %[[VAL_7]] - -// ----- - -// CHECK-LABEL: func @affine_loop -func.func @affine_loop() { - %buffer = memref.alloc() : memref<1024xf32> - %sum_init_0 = arith.constant 0.0 : f32 - %res = affine.for %i = 0 to 10 step 2 iter_args(%sum_iter = %sum_init_0) -> f32 { - %t = affine.load %buffer[%i] : memref<1024xf32> - %sum_next = arith.addf %sum_iter, %t : f32 - affine.yield %sum_next : f32 - } - // CHECK: %[[M:.*]] = memref.alloc - // CHECK: affine.for - // CHECK: } - // CHECK-NEXT: memref.dealloc %[[M]] - return -} - -// ----- - -// Test Case: explicit control-flow loop with a dynamically allocated buffer. -// The BufferDeallocation transformation should fail on this explicit -// control-flow loop since they are not supported. - -// expected-error@+1 {{Only structured control-flow loops are supported}} -func.func @loop_dynalloc( - %arg0 : i32, - %arg1 : i32, - %arg2: memref, - %arg3: memref) { - %const0 = arith.constant 0 : i32 - cf.br ^loopHeader(%const0, %arg2 : i32, memref) - -^loopHeader(%i : i32, %buff : memref): - %lessThan = arith.cmpi slt, %i, %arg1 : i32 - cf.cond_br %lessThan, - ^loopBody(%i, %buff : i32, memref), - ^exit(%buff : memref) - -^loopBody(%val : i32, %buff2: memref): - %const1 = arith.constant 1 : i32 - %inc = arith.addi %val, %const1 : i32 - %size = arith.index_cast %inc : i32 to index - %alloc1 = memref.alloc(%size) : memref - cf.br ^loopHeader(%inc, %alloc1 : i32, memref) - -^exit(%buff3 : memref): - test.copy(%buff3, %arg3) : (memref, memref) - return -} - -// ----- - -// Test Case: explicit control-flow loop with a dynamically allocated buffer. -// The BufferDeallocation transformation should fail on this explicit -// control-flow loop since they are not supported. - -// expected-error@+1 {{Only structured control-flow loops are supported}} -func.func @do_loop_alloc( - %arg0 : i32, - %arg1 : i32, - %arg2: memref<2xf32>, - %arg3: memref<2xf32>) { - %const0 = arith.constant 0 : i32 - cf.br ^loopBody(%const0, %arg2 : i32, memref<2xf32>) - -^loopBody(%val : i32, %buff2: memref<2xf32>): - %const1 = arith.constant 1 : i32 - %inc = arith.addi %val, %const1 : i32 - %alloc1 = memref.alloc() : memref<2xf32> - cf.br ^loopHeader(%inc, %alloc1 : i32, memref<2xf32>) - -^loopHeader(%i : i32, %buff : memref<2xf32>): - %lessThan = arith.cmpi slt, %i, %arg1 : i32 - cf.cond_br %lessThan, - ^loopBody(%i, %buff : i32, memref<2xf32>), - ^exit(%buff : memref<2xf32>) - -^exit(%buff3 : memref<2xf32>): - test.copy(%buff3, %arg3) : (memref<2xf32>, memref<2xf32>) - return -} - -// ----- - -// CHECK-LABEL: func @assumingOp( -func.func @assumingOp( - %arg0: !shape.witness, - %arg2: memref<2xf32>, - %arg3: memref<2xf32>) { - // Confirm the alloc will be dealloc'ed in the block. - %1 = shape.assuming %arg0 -> memref<2xf32> { - %0 = memref.alloc() : memref<2xf32> - shape.assuming_yield %arg2 : memref<2xf32> - } - // Confirm the alloc will be returned and dealloc'ed after its use. - %3 = shape.assuming %arg0 -> memref<2xf32> { - %2 = memref.alloc() : memref<2xf32> - shape.assuming_yield %2 : memref<2xf32> - } - test.copy(%3, %arg3) : (memref<2xf32>, memref<2xf32>) - return -} - -// CHECK-SAME: %[[ARG0:.*]]: !shape.witness, -// CHECK-SAME: %[[ARG1:.*]]: {{.*}}, -// CHECK-SAME: %[[ARG2:.*]]: {{.*}} -// CHECK: %[[UNUSED_RESULT:.*]] = shape.assuming %[[ARG0]] -// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc() -// CHECK-NEXT: memref.dealloc %[[ALLOC0]] -// CHECK-NEXT: shape.assuming_yield %[[ARG1]] -// CHECK: %[[ASSUMING_RESULT:.*]] = shape.assuming %[[ARG0]] -// CHECK-NEXT: %[[TMP_ALLOC:.*]] = memref.alloc() -// CHECK-NEXT: %[[RETURNING_ALLOC:.*]] = bufferization.clone %[[TMP_ALLOC]] -// CHECK-NEXT: memref.dealloc %[[TMP_ALLOC]] -// CHECK-NEXT: shape.assuming_yield %[[RETURNING_ALLOC]] -// CHECK: test.copy(%[[ASSUMING_RESULT:.*]], %[[ARG2]]) -// CHECK-NEXT: memref.dealloc %[[ASSUMING_RESULT]] - -// ----- - -// Test Case: The op "test.bar" does not implement the RegionBranchOpInterface. -// This is not allowed in buffer deallocation. - -func.func @noRegionBranchOpInterface() { -// expected-error@+1 {{All operations with attached regions need to implement the RegionBranchOpInterface.}} - %0 = "test.bar"() ({ -// expected-error@+1 {{All operations with attached regions need to implement the RegionBranchOpInterface.}} - %1 = "test.bar"() ({ - "test.yield"() : () -> () - }) : () -> (i32) - "test.yield"() : () -> () - }) : () -> (i32) - "test.terminator"() : () -> () -} - -// ----- - -// CHECK-LABEL: func @dealloc_existing_clones -// CHECK: (%[[ARG0:.*]]: memref, %[[ARG1:.*]]: memref) -// CHECK: %[[RES0:.*]] = bufferization.clone %[[ARG0]] -// CHECK: %[[RES1:.*]] = bufferization.clone %[[ARG1]] -// CHECK-NOT: memref.dealloc %[[RES0]] -// CHECK: memref.dealloc %[[RES1]] -// CHECK: return %[[RES0]] -func.func @dealloc_existing_clones(%arg0: memref, %arg1: memref) -> memref { - %0 = bufferization.clone %arg0 : memref to memref - %1 = bufferization.clone %arg1 : memref to memref - return %0 : memref -} - -// ----- - -// CHECK-LABEL: func @while_two_arg -func.func @while_two_arg(%arg0: index) { - %a = memref.alloc(%arg0) : memref -// CHECK: %[[WHILE:.*]]:2 = scf.while (%[[ARG1:.*]] = %[[ALLOC:.*]], %[[ARG2:.*]] = %[[CLONE:.*]]) - scf.while (%arg1 = %a, %arg2 = %a) : (memref, memref) -> (memref, memref) { -// CHECK-NEXT: make_condition - %0 = "test.make_condition"() : () -> i1 -// CHECK-NEXT: bufferization.clone %[[ARG2]] -// CHECK-NEXT: memref.dealloc %[[ARG2]] - scf.condition(%0) %arg1, %arg2 : memref, memref - } do { - ^bb0(%arg1: memref, %arg2: memref): -// CHECK: %[[ALLOC2:.*]] = memref.alloc - %b = memref.alloc(%arg0) : memref -// CHECK: memref.dealloc %[[ARG2]] -// CHECK: %[[CLONE2:.*]] = bufferization.clone %[[ALLOC2]] -// CHECK: memref.dealloc %[[ALLOC2]] - scf.yield %arg1, %b : memref, memref - } -// CHECK: } -// CHECK-NEXT: memref.dealloc %[[WHILE]]#1 -// CHECK-NEXT: memref.dealloc %[[ALLOC]] -// CHECK-NEXT: return - return -} - -// ----- - -func.func @while_three_arg(%arg0: index) { -// CHECK: %[[ALLOC:.*]] = memref.alloc - %a = memref.alloc(%arg0) : memref -// CHECK-NEXT: %[[CLONE1:.*]] = bufferization.clone %[[ALLOC]] -// CHECK-NEXT: %[[CLONE2:.*]] = bufferization.clone %[[ALLOC]] -// CHECK-NEXT: %[[CLONE3:.*]] = bufferization.clone %[[ALLOC]] -// CHECK-NEXT: memref.dealloc %[[ALLOC]] -// CHECK-NEXT: %[[WHILE:.*]]:3 = scf.while -// FIXME: This is non-deterministic -// CHECK-SAME-DAG: [[CLONE1]] -// CHECK-SAME-DAG: [[CLONE2]] -// CHECK-SAME-DAG: [[CLONE3]] - scf.while (%arg1 = %a, %arg2 = %a, %arg3 = %a) : (memref, memref, memref) -> (memref, memref, memref) { - %0 = "test.make_condition"() : () -> i1 - scf.condition(%0) %arg1, %arg2, %arg3 : memref, memref, memref - } do { - ^bb0(%arg1: memref, %arg2: memref, %arg3: memref): - %b = memref.alloc(%arg0) : memref - %q = memref.alloc(%arg0) : memref - scf.yield %q, %b, %arg2: memref, memref, memref - } -// CHECK-DAG: memref.dealloc %[[WHILE]]#0 -// CHECK-DAG: memref.dealloc %[[WHILE]]#1 -// CHECK-DAG: memref.dealloc %[[WHILE]]#2 -// CHECK-NEXT: return - return -} - -// ----- - -func.func @select_aliases(%arg0: index, %arg1: memref, %arg2: i1) { - // CHECK: memref.alloc - // CHECK: memref.alloc - // CHECK: arith.select - // CHECK: test.copy - // CHECK: memref.dealloc - // CHECK: memref.dealloc - %0 = memref.alloc(%arg0) : memref - %1 = memref.alloc(%arg0) : memref - %2 = arith.select %arg2, %0, %1 : memref - test.copy(%2, %arg1) : (memref, memref) - return -} - -// ----- - -func.func @f(%arg0: memref) -> memref { - return %arg0 : memref -} - -// CHECK-LABEL: func @function_call -// CHECK: memref.alloc -// CHECK: memref.alloc -// CHECK: call -// CHECK: test.copy -// CHECK: memref.dealloc -// CHECK: memref.dealloc -func.func @function_call() { - %alloc = memref.alloc() : memref - %alloc2 = memref.alloc() : memref - %ret = call @f(%alloc) : (memref) -> memref - test.copy(%ret, %alloc2) : (memref, memref) - return -} - -// ----- - -// Memref allocated in `then` region and passed back to the parent if op. -#set = affine_set<() : (0 >= 0)> -// CHECK-LABEL: func @test_affine_if_1 -// CHECK-SAME: %[[ARG0:.*]]: memref<10xf32>) -> memref<10xf32> { -func.func @test_affine_if_1(%arg0: memref<10xf32>) -> memref<10xf32> { - %0 = affine.if #set() -> memref<10xf32> { - %alloc = memref.alloc() : memref<10xf32> - affine.yield %alloc : memref<10xf32> - } else { - affine.yield %arg0 : memref<10xf32> - } - return %0 : memref<10xf32> -} -// CHECK-NEXT: %[[IF:.*]] = affine.if -// CHECK-NEXT: %[[MEMREF:.*]] = memref.alloc() : memref<10xf32> -// CHECK-NEXT: %[[CLONED:.*]] = bufferization.clone %[[MEMREF]] : memref<10xf32> to memref<10xf32> -// CHECK-NEXT: memref.dealloc %[[MEMREF]] : memref<10xf32> -// CHECK-NEXT: affine.yield %[[CLONED]] : memref<10xf32> -// CHECK-NEXT: } else { -// CHECK-NEXT: %[[ARG0_CLONE:.*]] = bufferization.clone %[[ARG0]] : memref<10xf32> to memref<10xf32> -// CHECK-NEXT: affine.yield %[[ARG0_CLONE]] : memref<10xf32> -// CHECK-NEXT: } -// CHECK-NEXT: return %[[IF]] : memref<10xf32> - -// ----- - -// Memref allocated before parent IfOp and used in `then` region. -// Expected result: deallocation should happen after affine.if op. -#set = affine_set<() : (0 >= 0)> -// CHECK-LABEL: func @test_affine_if_2() -> memref<10xf32> { -func.func @test_affine_if_2() -> memref<10xf32> { - %alloc0 = memref.alloc() : memref<10xf32> - %0 = affine.if #set() -> memref<10xf32> { - affine.yield %alloc0 : memref<10xf32> - } else { - %alloc = memref.alloc() : memref<10xf32> - affine.yield %alloc : memref<10xf32> - } - return %0 : memref<10xf32> -} -// CHECK-NEXT: %[[ALLOC:.*]] = memref.alloc() : memref<10xf32> -// CHECK-NEXT: %[[IF_RES:.*]] = affine.if {{.*}} -> memref<10xf32> { -// CHECK-NEXT: %[[ALLOC_CLONE:.*]] = bufferization.clone %[[ALLOC]] : memref<10xf32> to memref<10xf32> -// CHECK-NEXT: affine.yield %[[ALLOC_CLONE]] : memref<10xf32> -// CHECK-NEXT: } else { -// CHECK-NEXT: %[[ALLOC2:.*]] = memref.alloc() : memref<10xf32> -// CHECK-NEXT: %[[ALLOC2_CLONE:.*]] = bufferization.clone %[[ALLOC2]] : memref<10xf32> to memref<10xf32> -// CHECK-NEXT: memref.dealloc %[[ALLOC2]] : memref<10xf32> -// CHECK-NEXT: affine.yield %[[ALLOC2_CLONE]] : memref<10xf32> -// CHECK-NEXT: } -// CHECK-NEXT: memref.dealloc %[[ALLOC]] : memref<10xf32> -// CHECK-NEXT: return %[[IF_RES]] : memref<10xf32> - -// ----- - -// Memref allocated before parent IfOp and used in `else` region. -// Expected result: deallocation should happen after affine.if op. -#set = affine_set<() : (0 >= 0)> -// CHECK-LABEL: func @test_affine_if_3() -> memref<10xf32> { -func.func @test_affine_if_3() -> memref<10xf32> { - %alloc0 = memref.alloc() : memref<10xf32> - %0 = affine.if #set() -> memref<10xf32> { - %alloc = memref.alloc() : memref<10xf32> - affine.yield %alloc : memref<10xf32> - } else { - affine.yield %alloc0 : memref<10xf32> - } - return %0 : memref<10xf32> -} -// CHECK-NEXT: %[[ALLOC:.*]] = memref.alloc() : memref<10xf32> -// CHECK-NEXT: %[[IFRES:.*]] = affine.if {{.*}} -> memref<10xf32> { -// CHECK-NEXT: memref.alloc -// CHECK-NEXT: bufferization.clone -// CHECK-NEXT: memref.dealloc -// CHECK-NEXT: affine.yield -// CHECK-NEXT: } else { -// CHECK-NEXT: bufferization.clone -// CHECK-NEXT: affine.yield -// CHECK-NEXT: } -// CHECK-NEXT: memref.dealloc %[[ALLOC]] : memref<10xf32> -// CHECK-NEXT: return %[[IFRES]] : memref<10xf32> - -// ----- - -// Memref allocated before parent IfOp and not used later. -// Expected result: deallocation should happen before affine.if op. -#set = affine_set<() : (0 >= 0)> -// CHECK-LABEL: func @test_affine_if_4({{.*}}: memref<10xf32>) -> memref<10xf32> { -func.func @test_affine_if_4(%arg0 : memref<10xf32>) -> memref<10xf32> { - %alloc0 = memref.alloc() : memref<10xf32> - %0 = affine.if #set() -> memref<10xf32> { - affine.yield %arg0 : memref<10xf32> - } else { - %alloc = memref.alloc() : memref<10xf32> - affine.yield %alloc : memref<10xf32> - } - return %0 : memref<10xf32> -} -// CHECK-NEXT: %[[ALLOC:.*]] = memref.alloc() : memref<10xf32> -// CHECK-NEXT: memref.dealloc %[[ALLOC]] : memref<10xf32> -// CHECK-NEXT: affine.if - -// ----- - -// Ensure we free the realloc, not the alloc. - -// CHECK-LABEL: func @auto_dealloc() -func.func @auto_dealloc() { - %c10 = arith.constant 10 : index - %c100 = arith.constant 100 : index - %alloc = memref.alloc(%c10) : memref - %realloc = memref.realloc %alloc(%c100) : memref to memref - return -} -// CHECK-DAG: %[[C10:.*]] = arith.constant 10 : index -// CHECK-DAG: %[[C100:.*]] = arith.constant 100 : index -// CHECK-NEXT: %[[A:.*]] = memref.alloc(%[[C10]]) : memref -// CHECK-NEXT: %[[R:.*]] = memref.realloc %alloc(%[[C100]]) : memref to memref -// CHECK-NEXT: memref.dealloc %[[R]] : memref -// CHECK-NEXT: return - - diff --git a/mlir/test/Pass/pipeline-invalid.mlir b/mlir/test/Pass/pipeline-invalid.mlir index f9dd4c29dd7f0..948a13384bc75 100644 --- a/mlir/test/Pass/pipeline-invalid.mlir +++ b/mlir/test/Pass/pipeline-invalid.mlir @@ -1,8 +1,8 @@ // RUN: mlir-opt --no-implicit-module \ -// RUN: --pass-pipeline='any(buffer-deallocation)' --verify-diagnostics \ +// RUN: --pass-pipeline='any(test-function-pass)' --verify-diagnostics \ // RUN: --split-input-file %s -// Note: "buffer-deallocation" is a function pass. Any other function pass could +// Note: "test-function-pass" is a function pass. Any other function pass could // be used for this test. // expected-error@below {{trying to schedule a pass on an operation not marked as 'IsolatedFromAbove'}}