-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Perf: Fast quoted expression expansion #12009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Perf: Fast quoted expression expansion #12009
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR significantly improves the performance of quoted expression expansion by avoiding expensive Regex allocations and streamlining metadata resolution. Key changes include replacing the Regex.Replace call with manual matching and concatenation using optimized paths, introducing a FrozenDictionary for common item spec modifiers, and removing the legacy MetadataMatchEvaluator class.
|
||
/// <summary> | ||
/// A precomputed lookup of item spec modifiers wrapped in regex strings. | ||
/// This allows us to completely skip of Regex parsing when the innter string matches a known modifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: 'innter' should be 'inner'.
/// This allows us to completely skip of Regex parsing when the innter string matches a known modifier. | |
/// This allows us to completely skip Regex parsing when the inner string matches a known modifier. |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am SO excited about this.
// If we're not a ProjectItem or ProjectItemInstance, then ProjectDirectory will be null. | ||
// In that case, we're safe to get the current directory as we'll be running on TaskItems which | ||
// only exist within a target where we can trust the current directory | ||
string directoryToUse = sourceOfMetadata.ProjectDirectory ?? Directory.GetCurrentDirectory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[unrelated to this PR] @AR-May we'll have to rethink this assumption for threading.
{ | ||
return new OneOrMultipleMetadataMatches(string.Empty); | ||
} | ||
else if (s_itemSpecModifiers.TryGetValue(match.Value, out cachedName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate on this case in a comment? it's successful and it's . . . a single match, that isn't exactly one of the well-formed cases but is "close enough"? Is it like whitespace around the exact spelling so @(Foo->' %(Identity) ')
?
|
||
// Now we run the full loop. | ||
// This is a very hot path, so we avoid allocating this until after we know there are multiple matches. | ||
List<MetadataMatch> multipleMatches = [new MetadataMatch(firstMatch, name)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment an example expression that falls to this case? It makes sense from the code but I'm having a hard time backing into what the XML looked like to get here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly stuff like %(Filename)%(Extension)
, but you get ones like:
Microsoft.NET.GenerateAssemblyInfo.targets
(mix of known modifiers + other)
<Hash ItemsToHash="@(AssemblyAttribute->'%(Identity)%(_Parameter1)%(_Parameter2)%(_Parameter3)%(_Parameter4)%(_Parameter5)%(_Parameter6)%(_Parameter7)%(_Parameter8)')">
Microsoft.NET.Sdk.BeforeCommon.targets
(string interpolation)
<Target Name="GenerateNETCompatibleDefineConstants">
...
<_ImplicitDefineConstant Include="@(_FormattedCompatibleFrameworkVersions->'NETCOREAPP%(Identity)_OR_GREATER'->Replace('.', '_'))" />
src/Build/Evaluation/Expander.cs
Outdated
private enum MetadataMatchType | ||
{ | ||
ExactString, | ||
Single, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
Single, | |
InexactSingle, |
or something? and probably doc comments for these.
Fixes
Fairly involved change that trivializes this call path (to the point where retrieving the metadata costs more than the regex!)
Context
This bit of code is responsible for nearly all the allocations and runtime here:
Replace
takes a user-provided method, which is given each individualMatch
and spits out a string to concat as the result.As you see below though, the vast majority of allocations aren't even string-related, but rather the
Groups
-related objects and internal arrays thatMatch
creates the first time you access theGroups
property.This is basically impossible to avoid since neither .NET Framework or .NET Core provide an allocation-free way to access a single group that you're interested in, whether by name or ID, or reuse the related objects. The closest thing is a discussion to potentially introduce a
ValueGroup
object similar toValueMatch
that can be enumerated, but that appears to be a ways off.This means in order to solve this, we have to find ways to avoid entering the Regex path entirely.
Taken from
ExpandQuotedExpressionFunction
on main:Taken from
MetadataMatchEvaluator
on main:Changes Made
Besides the core part of avoiding
Regex.Replace()
and manually iterating the match object, there are a few observations to make here:%(ItemSpecModifier)
- therefore we can avoid processing these at all by just doing a simple dictionary lookup.Many two-match cases are also just a pair of itemspec modifiers - therefore we can avoid accessing the expensive
GroupCollection
allocation by performing a lookup on each match iteration as well.Most matches are the single-match case. We can avoid the vast majority of collection / array allocations by wrapping the result in a discriminated union of "one-or-more" matches. In the full match case, we can additional avoid any string allocations as well.