[Impeller] new blur: limit uvs to blur region #49299

gaaclarke · 2023-12-20T20:19:17Z

This will run the blur passes on a subset of the texture passed into it that will actually be blurred. This is an optimization for backdrop filters.

issue: flutter/flutter#131580
test coverage: devicelab backdrop_filter_perf_ios__timeline_summary

performance results

This results in a 50% reduction in average GPU time in our benchmark, 79% reduction in the "99% percentile".

BEFORE

    "average_vsync_transitions_missed": 3.889423076923077,
    "90th_percentile_vsync_transitions_missed": 4.0,
    "99th_percentile_vsync_transitions_missed": 4.0,
    "average_vsync_frame_lag": 0.0,
    "90th_percentile_vsync_frame_lag": 0.0,
    "99th_percentile_vsync_frame_lag": 0.0,
    "average_layer_cache_count": 0.0,
    "90th_percentile_layer_cache_count": 0.0,
    "99th_percentile_layer_cache_count": 0.0,
    "average_frame_request_pending_latency": 16654.368333333332,
    "90th_percentile_frame_request_pending_latency": 16692.0,
    "99th_percentile_frame_request_pending_latency": 16749.0,
    "worst_layer_cache_count": 0.0,
    "average_layer_cache_memory": 0.0,
    "90th_percentile_layer_cache_memory": 0.0,
    "99th_percentile_layer_cache_memory": 0.0,
    "worst_layer_cache_memory": 0.0,
    "average_picture_cache_count": 0.0,
    "90th_percentile_picture_cache_count": 0.0,
    "99th_percentile_picture_cache_count": 0.0,
    "worst_picture_cache_count": 0.0,
    "average_picture_cache_memory": 0.0,
    "90th_percentile_picture_cache_memory": 0.0,
    "99th_percentile_picture_cache_memory": 0.0,
    "worst_picture_cache_memory": 0.0,
    "total_ui_gc_time": 0.659,
    "30hz_frame_percentage": 0.0,
    "60hz_frame_percentage": 100.0,
    "80hz_frame_percentage": 0.0,
    "90hz_frame_percentage": 0.0,
    "120hz_frame_percentage": 0.0,
    "illegal_refresh_rate_frame_count": 0,
    "average_gpu_frame_time": 52.13341346153846,
    "90th_percentile_gpu_frame_time": 62.5,
    "99th_percentile_gpu_frame_time": 62.5,
    "worst_gpu_frame_time": 62.5,
    "average_cpu_usage": 69.08979595918369,
    "90th_percentile_cpu_usage": 70.4,
    "99th_percentile_cpu_usage": 71.699999,
    "average_gpu_usage": 100.0,
    "90th_percentile_gpu_usage": 100.0,
    "99th_percentile_gpu_usage": 100.0,
    "average_memory_usage": 118.79942602040816,
    "90th_percentile_memory_usage": 138.125,
    "99th_percentile_memory_usage": 143.65625

AFTER

    "average_vsync_transitions_missed": 2.0,
    "90th_percentile_vsync_transitions_missed": 2.0,
    "99th_percentile_vsync_transitions_missed": 2.0,
    "average_vsync_frame_lag": 0.0,
    "90th_percentile_vsync_frame_lag": 0.0,
    "99th_percentile_vsync_frame_lag": 0.0,
    "average_layer_cache_count": 0.0,
    "90th_percentile_layer_cache_count": 0.0,
    "99th_percentile_layer_cache_count": 0.0,
    "average_frame_request_pending_latency": 16635.025,
    "90th_percentile_frame_request_pending_latency": 16715.0,
    "99th_percentile_frame_request_pending_latency": 16802.0,
    "worst_layer_cache_count": 0.0,
    "average_layer_cache_memory": 0.0,
    "90th_percentile_layer_cache_memory": 0.0,
    "99th_percentile_layer_cache_memory": 0.0,
    "worst_layer_cache_memory": 0.0,
    "average_picture_cache_count": 0.0,
    "90th_percentile_picture_cache_count": 0.0,
    "99th_percentile_picture_cache_count": 0.0,
    "worst_picture_cache_count": 0.0,
    "average_picture_cache_memory": 0.0,
    "90th_percentile_picture_cache_memory": 0.0,
    "99th_percentile_picture_cache_memory": 0.0,
    "worst_picture_cache_memory": 0.0,
    "total_ui_gc_time": 1.732,
    "30hz_frame_percentage": 0.0,
    "60hz_frame_percentage": 100.0,
    "80hz_frame_percentage": 0.0,
    "90hz_frame_percentage": 0.0,
    "120hz_frame_percentage": 0.0,
    "illegal_refresh_rate_frame_count": 0,
    "average_gpu_frame_time": 25.01558603491272,
    "90th_percentile_gpu_frame_time": 31.25,
    "99th_percentile_gpu_frame_time": 31.25,

Pre-launch Checklist

I read the Contributor Guide and followed the process outlined there for submitting PRs.
I read the Tree Hygiene wiki page, which explains my responsibilities.
I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
I listed at least one issue that this PR fixes in the description above.
I added new tests to check the change I am making or feature I am adding, or the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
I updated/added relevant documentation (doc comments with ///).
I signed the CLA.
All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

gaaclarke · 2023-12-21T00:26:43Z

This doesn't quite work yet because of the rotation case. I attempted to switch the calculation to source coordinates but that doesn't seem to work:

  Quad blur_uvs = {Point(0, 0), Point(1, 0), Point(0, 1), Point(1, 1)};
  if (expanded_coverage_hint.has_value() &&
      input_snapshot_coverage.has_value()) {
    Matrix local_to_source =
        (entity.GetTransform()).Invert();
    Quad source_expanded_coverage_hint_quad =
        local_to_source.Transform(expanded_coverage_hint->GetPoints());
    Quad source_input_snapshot_coverage_quad =
        local_to_source.Transform(input_snapshot_coverage->GetPoints());
    Rect source_expanded_coverage_hint =
        Rect::MakePointBounds(source_expanded_coverage_hint_quad).value();
    Rect source_input_snapshot_coverage =
        Rect::MakePointBounds(source_input_snapshot_coverage_quad).value();
    std::optional<Rect> uvs = MakeReferenceUVs(source_input_snapshot_coverage,
                                               source_expanded_coverage_hint)
                                  .Intersection(Rect::MakeSize(Size(1, 1)));
    FML_DCHECK(uvs.has_value());
    if (uvs.has_value()) {
      blur_uvs[0] = uvs->GetLeftTop();
      blur_uvs[1] = uvs->GetRightTop();
      blur_uvs[2] = uvs->GetLeftBottom();
      blur_uvs[3] = uvs->GetRightBottom();
    }
  }

gaaclarke · 2024-01-02T21:44:42Z

After this change we are 4x faster than the old blur in the benchmark: #48472 (comment)
(average_gpu_usage: 100.0 vs 25.02).

gaaclarke · 2024-01-02T22:07:56Z

I punted on making this work for rotated entities. I think there is a bug in the coverage of rotated entities and since this optimization is built on that, it was demonstrating that bug. Bugs in coverage are less of a big deal unless rendering is based on it.

flutter-dashboard · 2024-01-02T22:19:45Z

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact "@test-exemption-reviewer" in the #hackers channel in Chat (don't just cc them here, they won't see it! Use Discord!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

impeller/entity/contents/filters/filter_contents.cc

impeller/entity/contents/filters/gaussian_blur_filter_contents.cc

bdero · 2023-12-21T07:28:51Z

impeller/entity/contents/filters/gaussian_blur_filter_contents.cc

+      blur_uvs[2] = uvs->GetLeftBottom();
+      blur_uvs[3] = uvs->GetRightBottom();
+    }
+  }


As you pointed out, this doesn't work when the basis of the entity transform isn't identity. Video of the issue:

Screen.Recording.2023-12-20.at.7.19.06.PM.mov

Just so we're on the same page: Snapshots are an instruction for how to render a texture consisting of sampling state, a transform, and the texture itself.

The issue in detail

This issue happens because the coverage_hint is given in screen/pass space and texture-based filter inputs (like TiledTextureContents or TextureContents) create snapshots that are in local space. So the Entity's transform will simply be forwarded through to the Snapshot::transform field. So the act of snapshotting incurs zero rendering overhead by deferring the transform via the snapshot protocol.

Filters (and everything else that consumes snapshots) must contend with this behavior. And that's a very good thing! Because absorbing the snapshot would mean burdening the blur workload with a very expensive 4th RenderPass that does nothing but sample the entire original image and print it to a possibly oversized texture with a bunch of unused fully transparent area. And that unused transparent area would be a very annoying problem to deal with in the blur, because we'd need to carefully work around its presence when generating the outer sampling region in the second pass. I wouldn't want to be on the hook for that UV mess! Same goes for optimizing the two subsequent blur passes.

For other kinds of inputs that aren't already a rasterized image (like, say, a SolidColorContents that draws a circle), we need a RenderPass to snapshot anyhow, and so we get to absorb the full transform when snapshotting with no loss to performance. For these cases, the Snapshot::transform will just be a translation to position the resulting texture correctly on-screen and nothing else, regardless of whatever nutty Entity transform was given.

A quick fix

So your solution as written works well when the Entity transform is specifically translation-only. This is because under those circumstances, the input_snapshot_coverage accurately shrinkwraps the texture. And both coverage_hint and input_snapshot_coverage are in the same space.

Well, backdrop filters and subpass textures are actually rendered by EntityPass in translated screen/pass space, and so the Entity transform for all backdrop filters already qualifies as translation-only. Backdrop filters are the major pain point that spawned the need for the coverage hint in the first place, so an effective short-term fix would be to just turn this optimization on only when the Entity transform is translation-only.

A comprehensive fix (make it work for ui.image texture inputs too)

The "quick fix" is actually good setup for the more comprehensive fix, so check that out first.

Right now, we take our Entity transform and throw the whole thing at the snapshot input at once. But we could just.... pass a transform containing only the scale of the basis vectors into the snapshot (the reason we'd pass the scale here is that we need to allow non-texture inputs to consume the scale so that e.g. scaled up paths won't end up having a source image that's too pixellated), which will guarantee that we always get a translation-scale-only result back from the snapshot.

Now we can form the full "deferred" transform by normalizing the basis vectors of the original Entity transform and multiplying it with the snapshot's returned transform on the RHS. This "deferred" transform should get multiplied it into the filter's returned Entity transform.

Caution
Since we're deferring the input snapshot's transform, the snapshot's transform should be changed to an identity matrix before calling Snapshot::GetCoverage to compute input_snapshot_coverage. Otherwise the input_snapshot_coverage will be defined in the wrong space for some texture-based filter inputs.

This means that we get to operate in an axis-aligned space that's very close to local space, which allows us to avoid all of the extra UV mapping wackiness that would otherwise be required of us. Deferring most of the entity transform also means that the blur effect will get transformed correctly as well, and even work the same way that Skia does for non-affine/perspective transforms.

So now for the coverage hint. input_snapshot_coverage is no longer in screen space, but our close-to-local space, so we need to convert the coverage_hint (and by extension, any derivatives of the coverage_hint) to conservative bounds in our local-space. This can be done by applying the inverse deferred transform to the rectangle and re-bounding it. And again, the "deferred" transform is the same transform that we deferred to the returned Entity described above. Rect::TransformBounds is intended for precisely this kind of situation:

auto local_coverage_hint = coverage_hint.TransformBounds(deferred_transform.Invert());

From here, the local_coverage_hint can be used everywhere in place of the coverage_hint, and the blurred texture will trim it's sides to hug the coverage hint regardless of the Entity transform, but never cross into it.

This whole thing is a slightly more detailed explanation of the solution I proposed in flutter/flutter#139165 (comment).

@gaaclarke lemme know if you'd like to dig into this in detail at some point. Reading it back, it's a bit dense... But the quick fix should be super simple.

Ah actually looking at the code, it looks like the guard for the quick fix is partially in place.

Okay, I created a new issue to address the rotation case: flutter/flutter#140890. I'm going to land this and deprioritize that while we get other optimizations in and try to address shimmering.

bdero · 2024-01-02T23:10:34Z

impeller/entity/contents/filters/gaussian_blur_filter_contents.cc

+      //            either
+      //            (https://github.com/flutter/engine/pull/49299#issuecomment-1865311438).
+      input_snapshot.has_value() &&
+      input_snapshot.value().transform.IsTranslationScaleOnly()) {


I think this needs to:

check the entity.GetTransform() (rather than the input_snapshot's transform)

check if the transform is translation-only (rather than scale-translation-only)

If either one of these isn't true, then I think coverage_hint and input_snapshot_coverage could end up being in difference spaces in some situations, which will cause a variation of the UV mapping bug.

If you're up for it, one way you could smoke test this issue is by adding a scale param to the interactive Aiks toy.

I added the parameter to the interactive test and added an automated test. Let me know if you are thinking of some other case where this may break.

To be clear it's necessarily that I don't believe there can be a problem; I just don't want to shrink the domain of this optimization without proof that it doesn't work everywhere via a test. That test can serve as the basis for expanding the domain in the future.

Oh yeah, I get you. Also I think you should land this at will, we can patch other issues as we find em. ;)

It would also be good to test a case where the transform+coverage rect is absorbed by the snapshot rather than forwarded. For example: A blur applied to a gradient circle (can't be a solid color since that'll fall into the rrect fast path).

I just tried the interactive clip when drawing a rect with a gradient on a rrect. I wasn't able to see a problem. I'm going to leave the existing test as is since it doesn't seem to demonstrate anything different.

bdero

Overall LGTM! I left one comment about a space issue that's an easy fix.

flutter-dashboard · 2024-01-03T00:19:58Z

Golden file changes have been found for this pull request. Click here to view and triage (e.g. because this is an intentional change).

If you are still iterating on this change and are not ready to resolve the images on the Flutter Gold dashboard, consider marking this PR as a draft pull request above. You will still be able to view image results on the dashboard, commenting will be silenced, and the check will not try to resolve itself until marked ready for review.

Changes reported for pull request #49299 at sha 4a9b2af

…ne#49299)

…140920) flutter/engine@bf232c4...7c2adb8 2024-01-03 [email protected] Roll Skia from 063e339bedfc to 1ce1aa1f90fa (6 revisions) (flutter/engine#49503) 2024-01-03 [email protected] [Impeller] new blur: limit uvs to blur region (flutter/engine#49299) 2024-01-03 [email protected] Roll Skia from 1144b7950404 to 063e339bedfc (7 revisions) (flutter/engine#49500) If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC [email protected],[email protected] on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://issues.skia.org/issues/new?component=1389291&template=1850622 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md

issue: flutter/flutter#131580 This factors out the gaussian blur fragment shader to be a more generic kernel fragment shader. We can't use this for the old gaussian blur because it's kernel is unbounded in one dimension. I implemented this by passing in the coefficients into the UBO so it can't support that. Profile after change: ``` "average_vsync_transitions_missed": 2.0, "90th_percentile_vsync_transitions_missed": 2.0, "99th_percentile_vsync_transitions_missed": 2.0, "average_vsync_frame_lag": 0.0, "90th_percentile_vsync_frame_lag": 0.0, "99th_percentile_vsync_frame_lag": 0.0, "average_layer_cache_count": 0.0, "90th_percentile_layer_cache_count": 0.0, "99th_percentile_layer_cache_count": 0.0, "average_frame_request_pending_latency": 16636.624373956594, "90th_percentile_frame_request_pending_latency": 16718.0, "99th_percentile_frame_request_pending_latency": 16761.0, "worst_layer_cache_count": 0.0, "average_layer_cache_memory": 0.0, "90th_percentile_layer_cache_memory": 0.0, "99th_percentile_layer_cache_memory": 0.0, "worst_layer_cache_memory": 0.0, "average_picture_cache_count": 0.0, "90th_percentile_picture_cache_count": 0.0, "99th_percentile_picture_cache_count": 0.0, "worst_picture_cache_count": 0.0, "average_picture_cache_memory": 0.0, "90th_percentile_picture_cache_memory": 0.0, "99th_percentile_picture_cache_memory": 0.0, "worst_picture_cache_memory": 0.0, "total_ui_gc_time": 1.65, "30hz_frame_percentage": 0.0, "60hz_frame_percentage": 100.0, "80hz_frame_percentage": 0.0, "90hz_frame_percentage": 0.0, "120hz_frame_percentage": 0.0, "illegal_refresh_rate_frame_count": 0, "average_gpu_frame_time": 21.875, "90th_percentile_gpu_frame_time": 31.25, "99th_percentile_gpu_frame_time": 31.25, "worst_gpu_frame_time": 31.25, "average_cpu_usage": 162.86326532653064, "90th_percentile_cpu_usage": 167.3, "99th_percentile_cpu_usage": 168.500003, "average_gpu_usage": 93.26530612244898, "90th_percentile_gpu_usage": 94.0, "99th_percentile_gpu_usage": 95.0, "average_memory_usage": 113.56696428571429, "90th_percentile_memory_usage": 145.484375, "99th_percentile_memory_usage": 152.75 ``` This is a 13% (21.25 / 25 ) drop in `average_gpu_frame_time` from the last profile in #49299. [C++, Objective-C, Java style guides]: https://github.com/flutter/engine/blob/main/CONTRIBUTING.md#style

github-actions bot added the e: impeller label Dec 20, 2023

gaaclarke force-pushed the blur-limit-uvs branch 2 times, most recently from 04ce9a8 to 7b34ed1 Compare December 20, 2023 23:35

gaaclarke force-pushed the blur-limit-uvs branch 5 times, most recently from 47c4eb7 to 5b723f0 Compare January 2, 2024 21:42

[Impeller] limit blur uvs to the clip region

43d6539

gaaclarke force-pushed the blur-limit-uvs branch from 5b723f0 to 43d6539 Compare January 2, 2024 21:51

gaaclarke marked this pull request as ready for review January 2, 2024 22:09

gaaclarke requested review from bdero and jonahwilliams January 2, 2024 22:09

bdero reviewed Jan 2, 2024

View reviewed changes

bdero approved these changes Jan 2, 2024

View reviewed changes

gaaclarke added 2 commits January 2, 2024 15:16

added assert

8d92328

brandon feedback

4a9b2af

flutter-dashboard bot added the will affect goldens label Jan 3, 2024

gaaclarke mentioned this pull request Jan 3, 2024

[Impeller] new blur: expand coverage clip to work for rotated entities flutter/flutter#140890

Closed

updated comment

a809f2d

gaaclarke added autosubmit Merge PR when tree becomes green via auto submit App and removed autosubmit Merge PR when tree becomes green via auto submit App labels Jan 3, 2024

gaaclarke force-pushed the blur-limit-uvs branch from 8b14439 to a809f2d Compare January 3, 2024 18:51

gaaclarke added the autosubmit Merge PR when tree becomes green via auto submit App label Jan 3, 2024

auto-submit bot merged commit db82740 into flutter:main Jan 3, 2024

engine-flutter-autoroll mentioned this pull request Jan 3, 2024

Roll Flutter Engine from bf232c4da241 to 7c2adb811059 (3 revisions) flutter/flutter#140920

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jan 3, 2024

db827406f [Impeller] new blur: limit uvs to blur region (flutter/engi…

de1175e

…ne#49299)

gaaclarke mentioned this pull request Jan 4, 2024

[Impeller] migrate blur to calculating coefficients on the cpu #49512

Merged

8 tasks

[Impeller] new blur: limit uvs to blur region #49299

[Impeller] new blur: limit uvs to blur region #49299

Uh oh!

Conversation

gaaclarke commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

performance results

Pre-launch Checklist

Uh oh!

gaaclarke commented Dec 21, 2023

Uh oh!

gaaclarke commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaaclarke commented Jan 2, 2024

Uh oh!

flutter-dashboard bot commented Jan 2, 2024

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

The issue in detail

A quick fix

A comprehensive fix (make it work for ui.image texture inputs too)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdero left a comment

Choose a reason for hiding this comment

Uh oh!

flutter-dashboard bot commented Jan 3, 2024

Uh oh!

Uh oh!

gaaclarke commented Dec 20, 2023 •

edited

Loading

gaaclarke commented Jan 2, 2024 •

edited

Loading

A comprehensive fix (make it work for `ui.image` texture inputs too)