Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

[Impeller] new blur: limit uvs to blur region #49299

Merged
merged 4 commits into from
Jan 3, 2024

Conversation

gaaclarke
Copy link
Member

@gaaclarke gaaclarke commented Dec 20, 2023

This will run the blur passes on a subset of the texture passed into it that will actually be blurred. This is an optimization for backdrop filters.

issue: flutter/flutter#131580
test coverage: devicelab backdrop_filter_perf_ios__timeline_summary

performance results

This results in a 50% reduction in average GPU time in our benchmark, 79% reduction in the "99% percentile".

BEFORE

    "average_vsync_transitions_missed": 3.889423076923077,
    "90th_percentile_vsync_transitions_missed": 4.0,
    "99th_percentile_vsync_transitions_missed": 4.0,
    "average_vsync_frame_lag": 0.0,
    "90th_percentile_vsync_frame_lag": 0.0,
    "99th_percentile_vsync_frame_lag": 0.0,
    "average_layer_cache_count": 0.0,
    "90th_percentile_layer_cache_count": 0.0,
    "99th_percentile_layer_cache_count": 0.0,
    "average_frame_request_pending_latency": 16654.368333333332,
    "90th_percentile_frame_request_pending_latency": 16692.0,
    "99th_percentile_frame_request_pending_latency": 16749.0,
    "worst_layer_cache_count": 0.0,
    "average_layer_cache_memory": 0.0,
    "90th_percentile_layer_cache_memory": 0.0,
    "99th_percentile_layer_cache_memory": 0.0,
    "worst_layer_cache_memory": 0.0,
    "average_picture_cache_count": 0.0,
    "90th_percentile_picture_cache_count": 0.0,
    "99th_percentile_picture_cache_count": 0.0,
    "worst_picture_cache_count": 0.0,
    "average_picture_cache_memory": 0.0,
    "90th_percentile_picture_cache_memory": 0.0,
    "99th_percentile_picture_cache_memory": 0.0,
    "worst_picture_cache_memory": 0.0,
    "total_ui_gc_time": 0.659,
    "30hz_frame_percentage": 0.0,
    "60hz_frame_percentage": 100.0,
    "80hz_frame_percentage": 0.0,
    "90hz_frame_percentage": 0.0,
    "120hz_frame_percentage": 0.0,
    "illegal_refresh_rate_frame_count": 0,
    "average_gpu_frame_time": 52.13341346153846,
    "90th_percentile_gpu_frame_time": 62.5,
    "99th_percentile_gpu_frame_time": 62.5,
    "worst_gpu_frame_time": 62.5,
    "average_cpu_usage": 69.08979595918369,
    "90th_percentile_cpu_usage": 70.4,
    "99th_percentile_cpu_usage": 71.699999,
    "average_gpu_usage": 100.0,
    "90th_percentile_gpu_usage": 100.0,
    "99th_percentile_gpu_usage": 100.0,
    "average_memory_usage": 118.79942602040816,
    "90th_percentile_memory_usage": 138.125,
    "99th_percentile_memory_usage": 143.65625

AFTER

    "average_vsync_transitions_missed": 2.0,
    "90th_percentile_vsync_transitions_missed": 2.0,
    "99th_percentile_vsync_transitions_missed": 2.0,
    "average_vsync_frame_lag": 0.0,
    "90th_percentile_vsync_frame_lag": 0.0,
    "99th_percentile_vsync_frame_lag": 0.0,
    "average_layer_cache_count": 0.0,
    "90th_percentile_layer_cache_count": 0.0,
    "99th_percentile_layer_cache_count": 0.0,
    "average_frame_request_pending_latency": 16635.025,
    "90th_percentile_frame_request_pending_latency": 16715.0,
    "99th_percentile_frame_request_pending_latency": 16802.0,
    "worst_layer_cache_count": 0.0,
    "average_layer_cache_memory": 0.0,
    "90th_percentile_layer_cache_memory": 0.0,
    "99th_percentile_layer_cache_memory": 0.0,
    "worst_layer_cache_memory": 0.0,
    "average_picture_cache_count": 0.0,
    "90th_percentile_picture_cache_count": 0.0,
    "99th_percentile_picture_cache_count": 0.0,
    "worst_picture_cache_count": 0.0,
    "average_picture_cache_memory": 0.0,
    "90th_percentile_picture_cache_memory": 0.0,
    "99th_percentile_picture_cache_memory": 0.0,
    "worst_picture_cache_memory": 0.0,
    "total_ui_gc_time": 1.732,
    "30hz_frame_percentage": 0.0,
    "60hz_frame_percentage": 100.0,
    "80hz_frame_percentage": 0.0,
    "90hz_frame_percentage": 0.0,
    "120hz_frame_percentage": 0.0,
    "illegal_refresh_rate_frame_count": 0,
    "average_gpu_frame_time": 25.01558603491272,
    "90th_percentile_gpu_frame_time": 31.25,
    "99th_percentile_gpu_frame_time": 31.25,

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
  • I listed at least one issue that this PR fixes in the description above.
  • I added new tests to check the change I am making or feature I am adding, or the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
  • I updated/added relevant documentation (doc comments with ///).
  • I signed the CLA.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@gaaclarke gaaclarke force-pushed the blur-limit-uvs branch 2 times, most recently from 04ce9a8 to 7b34ed1 Compare December 20, 2023 23:35
@gaaclarke
Copy link
Member Author

This doesn't quite work yet because of the rotation case. I attempted to switch the calculation to source coordinates but that doesn't seem to work:

  Quad blur_uvs = {Point(0, 0), Point(1, 0), Point(0, 1), Point(1, 1)};
  if (expanded_coverage_hint.has_value() &&
      input_snapshot_coverage.has_value()) {
    Matrix local_to_source =
        (entity.GetTransform()).Invert();
    Quad source_expanded_coverage_hint_quad =
        local_to_source.Transform(expanded_coverage_hint->GetPoints());
    Quad source_input_snapshot_coverage_quad =
        local_to_source.Transform(input_snapshot_coverage->GetPoints());
    Rect source_expanded_coverage_hint =
        Rect::MakePointBounds(source_expanded_coverage_hint_quad).value();
    Rect source_input_snapshot_coverage =
        Rect::MakePointBounds(source_input_snapshot_coverage_quad).value();
    std::optional<Rect> uvs = MakeReferenceUVs(source_input_snapshot_coverage,
                                               source_expanded_coverage_hint)
                                  .Intersection(Rect::MakeSize(Size(1, 1)));
    FML_DCHECK(uvs.has_value());
    if (uvs.has_value()) {
      blur_uvs[0] = uvs->GetLeftTop();
      blur_uvs[1] = uvs->GetRightTop();
      blur_uvs[2] = uvs->GetLeftBottom();
      blur_uvs[3] = uvs->GetRightBottom();
    }
  }

@gaaclarke gaaclarke force-pushed the blur-limit-uvs branch 5 times, most recently from 47c4eb7 to 5b723f0 Compare January 2, 2024 21:42
@gaaclarke
Copy link
Member Author

gaaclarke commented Jan 2, 2024

After this change we are 4x faster than the old blur in the benchmark: #48472 (comment)
(average_gpu_usage: 100.0 vs 25.02).

@gaaclarke
Copy link
Member Author

I punted on making this work for rotated entities. I think there is a bug in the coverage of rotated entities and since this optimization is built on that, it was demonstrating that bug. Bugs in coverage are less of a big deal unless rendering is based on it.

@gaaclarke gaaclarke marked this pull request as ready for review January 2, 2024 22:09
@flutter-dashboard
Copy link

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact "@test-exemption-reviewer" in the #hackers channel in Chat (don't just cc them here, they won't see it! Use Discord!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

blur_uvs[2] = uvs->GetLeftBottom();
blur_uvs[3] = uvs->GetRightBottom();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, this doesn't work when the basis of the entity transform isn't identity. Video of the issue:

Screen.Recording.2023-12-20.at.7.19.06.PM.mov

Just so we're on the same page: Snapshots are an instruction for how to render a texture consisting of sampling state, a transform, and the texture itself.

The issue in detail

This issue happens because the coverage_hint is given in screen/pass space and texture-based filter inputs (like TiledTextureContents or TextureContents) create snapshots that are in local space. So the Entity's transform will simply be forwarded through to the Snapshot::transform field. So the act of snapshotting incurs zero rendering overhead by deferring the transform via the snapshot protocol.

Filters (and everything else that consumes snapshots) must contend with this behavior. And that's a very good thing! Because absorbing the snapshot would mean burdening the blur workload with a very expensive 4th RenderPass that does nothing but sample the entire original image and print it to a possibly oversized texture with a bunch of unused fully transparent area. And that unused transparent area would be a very annoying problem to deal with in the blur, because we'd need to carefully work around its presence when generating the outer sampling region in the second pass. I wouldn't want to be on the hook for that UV mess! Same goes for optimizing the two subsequent blur passes.

For other kinds of inputs that aren't already a rasterized image (like, say, a SolidColorContents that draws a circle), we need a RenderPass to snapshot anyhow, and so we get to absorb the full transform when snapshotting with no loss to performance. For these cases, the Snapshot::transform will just be a translation to position the resulting texture correctly on-screen and nothing else, regardless of whatever nutty Entity transform was given.

A quick fix

So your solution as written works well when the Entity transform is specifically translation-only. This is because under those circumstances, the input_snapshot_coverage accurately shrinkwraps the texture. And both coverage_hint and input_snapshot_coverage are in the same space.

Well, backdrop filters and subpass textures are actually rendered by EntityPass in translated screen/pass space, and so the Entity transform for all backdrop filters already qualifies as translation-only. Backdrop filters are the major pain point that spawned the need for the coverage hint in the first place, so an effective short-term fix would be to just turn this optimization on only when the Entity transform is translation-only.

A comprehensive fix (make it work for ui.image texture inputs too)

The "quick fix" is actually good setup for the more comprehensive fix, so check that out first.

Right now, we take our Entity transform and throw the whole thing at the snapshot input at once. But we could just.... pass a transform containing only the scale of the basis vectors into the snapshot (the reason we'd pass the scale here is that we need to allow non-texture inputs to consume the scale so that e.g. scaled up paths won't end up having a source image that's too pixellated), which will guarantee that we always get a translation-scale-only result back from the snapshot.

Now we can form the full "deferred" transform by normalizing the basis vectors of the original Entity transform and multiplying it with the snapshot's returned transform on the RHS. This "deferred" transform should get multiplied it into the filter's returned Entity transform.

Caution

Since we're deferring the input snapshot's transform, the snapshot's transform should be changed to an identity matrix before calling Snapshot::GetCoverage to compute input_snapshot_coverage. Otherwise the input_snapshot_coverage will be defined in the wrong space for some texture-based filter inputs.

This means that we get to operate in an axis-aligned space that's very close to local space, which allows us to avoid all of the extra UV mapping wackiness that would otherwise be required of us. Deferring most of the entity transform also means that the blur effect will get transformed correctly as well, and even work the same way that Skia does for non-affine/perspective transforms.

So now for the coverage hint. input_snapshot_coverage is no longer in screen space, but our close-to-local space, so we need to convert the coverage_hint (and by extension, any derivatives of the coverage_hint) to conservative bounds in our local-space. This can be done by applying the inverse deferred transform to the rectangle and re-bounding it. And again, the "deferred" transform is the same transform that we deferred to the returned Entity described above. Rect::TransformBounds is intended for precisely this kind of situation:

auto local_coverage_hint = coverage_hint.TransformBounds(deferred_transform.Invert());

From here, the local_coverage_hint can be used everywhere in place of the coverage_hint, and the blurred texture will trim it's sides to hug the coverage hint regardless of the Entity transform, but never cross into it.

This whole thing is a slightly more detailed explanation of the solution I proposed in flutter/flutter#139165 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaaclarke lemme know if you'd like to dig into this in detail at some point. Reading it back, it's a bit dense... But the quick fix should be super simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah actually looking at the code, it looks like the guard for the quick fix is partially in place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I created a new issue to address the rotation case: flutter/flutter#140890. I'm going to land this and deprioritize that while we get other optimizations in and try to address shimmering.

// either
// (https://github.com/flutter/engine/pull/49299#issuecomment-1865311438).
input_snapshot.has_value() &&
input_snapshot.value().transform.IsTranslationScaleOnly()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to:

  • check the entity.GetTransform() (rather than the input_snapshot's transform)
  • check if the transform is translation-only (rather than scale-translation-only)

If either one of these isn't true, then I think coverage_hint and input_snapshot_coverage could end up being in difference spaces in some situations, which will cause a variation of the UV mapping bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're up for it, one way you could smoke test this issue is by adding a scale param to the interactive Aiks toy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the parameter to the interactive test and added an automated test. Let me know if you are thinking of some other case where this may break.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear it's necessarily that I don't believe there can be a problem; I just don't want to shrink the domain of this optimization without proof that it doesn't work everywhere via a test. That test can serve as the basis for expanding the domain in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I get you. Also I think you should land this at will, we can patch other issues as we find em. ;)

It would also be good to test a case where the transform+coverage rect is absorbed by the snapshot rather than forwarded. For example: A blur applied to a gradient circle (can't be a solid color since that'll fall into the rrect fast path).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried the interactive clip when drawing a rect with a gradient on a rrect. I wasn't able to see a problem. I'm going to leave the existing test as is since it doesn't seem to demonstrate anything different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

@bdero bdero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! I left one comment about a space issue that's an easy fix.

@flutter-dashboard
Copy link

Golden file changes have been found for this pull request. Click here to view and triage (e.g. because this is an intentional change).

If you are still iterating on this change and are not ready to resolve the images on the Flutter Gold dashboard, consider marking this PR as a draft pull request above. You will still be able to view image results on the dashboard, commenting will be silenced, and the check will not try to resolve itself until marked ready for review.

Changes reported for pull request #49299 at sha 4a9b2af

@gaaclarke gaaclarke added autosubmit Merge PR when tree becomes green via auto submit App and removed autosubmit Merge PR when tree becomes green via auto submit App labels Jan 3, 2024
@gaaclarke gaaclarke added the autosubmit Merge PR when tree becomes green via auto submit App label Jan 3, 2024
@auto-submit auto-submit bot merged commit db82740 into flutter:main Jan 3, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jan 3, 2024
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request Jan 3, 2024
…140920)

flutter/engine@bf232c4...7c2adb8

2024-01-03 [email protected] Roll Skia from 063e339bedfc to 1ce1aa1f90fa (6 revisions) (flutter/engine#49503)
2024-01-03 [email protected] [Impeller] new blur: limit uvs to blur region (flutter/engine#49299)
2024-01-03 [email protected] Roll Skia from 1144b7950404 to 063e339bedfc (7 revisions) (flutter/engine#49500)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC [email protected],[email protected] on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
auto-submit bot pushed a commit that referenced this pull request Jan 4, 2024
issue: flutter/flutter#131580

This factors out the gaussian blur fragment shader to be a more generic kernel fragment shader.  We can't use this for the old gaussian blur because it's kernel is unbounded in one dimension.  I implemented this by passing in the coefficients into the UBO so it can't support that.

Profile after change:
```
    "average_vsync_transitions_missed": 2.0,
    "90th_percentile_vsync_transitions_missed": 2.0,
    "99th_percentile_vsync_transitions_missed": 2.0,
    "average_vsync_frame_lag": 0.0,
    "90th_percentile_vsync_frame_lag": 0.0,
    "99th_percentile_vsync_frame_lag": 0.0,
    "average_layer_cache_count": 0.0,
    "90th_percentile_layer_cache_count": 0.0,
    "99th_percentile_layer_cache_count": 0.0,
    "average_frame_request_pending_latency": 16636.624373956594,
    "90th_percentile_frame_request_pending_latency": 16718.0,
    "99th_percentile_frame_request_pending_latency": 16761.0,
    "worst_layer_cache_count": 0.0,
    "average_layer_cache_memory": 0.0,
    "90th_percentile_layer_cache_memory": 0.0,
    "99th_percentile_layer_cache_memory": 0.0,
    "worst_layer_cache_memory": 0.0,
    "average_picture_cache_count": 0.0,
    "90th_percentile_picture_cache_count": 0.0,
    "99th_percentile_picture_cache_count": 0.0,
    "worst_picture_cache_count": 0.0,
    "average_picture_cache_memory": 0.0,
    "90th_percentile_picture_cache_memory": 0.0,
    "99th_percentile_picture_cache_memory": 0.0,
    "worst_picture_cache_memory": 0.0,
    "total_ui_gc_time": 1.65,
    "30hz_frame_percentage": 0.0,
    "60hz_frame_percentage": 100.0,
    "80hz_frame_percentage": 0.0,
    "90hz_frame_percentage": 0.0,
    "120hz_frame_percentage": 0.0,
    "illegal_refresh_rate_frame_count": 0,
    "average_gpu_frame_time": 21.875,
    "90th_percentile_gpu_frame_time": 31.25,
    "99th_percentile_gpu_frame_time": 31.25,
    "worst_gpu_frame_time": 31.25,
    "average_cpu_usage": 162.86326532653064,
    "90th_percentile_cpu_usage": 167.3,
    "99th_percentile_cpu_usage": 168.500003,
    "average_gpu_usage": 93.26530612244898,
    "90th_percentile_gpu_usage": 94.0,
    "99th_percentile_gpu_usage": 95.0,
    "average_memory_usage": 113.56696428571429,
    "90th_percentile_memory_usage": 145.484375,
    "99th_percentile_memory_usage": 152.75
```

This is a 13% (21.25 / 25 ) drop in `average_gpu_frame_time` from the last profile in #49299.

[C++, Objective-C, Java style guides]: https://github.com/flutter/engine/blob/main/CONTRIBUTING.md#style
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
autosubmit Merge PR when tree becomes green via auto submit App e: impeller will affect goldens
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

2 participants