Skip to content

Perf: Fast quoted expression expansion #12009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ccastanedaucf
Copy link
Contributor

@ccastanedaucf ccastanedaucf commented Jun 12, 2025

Fixes

Fairly involved change that trivializes this call path (to the point where retrieving the metadata costs more than the regex!)

image

image

image

image

Context

This bit of code is responsible for nearly all the allocations and runtime here:

matchEvaluator = new MetadataMatchEvaluator(item.Key, item.Value, elementLocation);
include = RegularExpressions.ItemMetadataRegex.Replace(arguments[0], matchEvaluator.GetMetadataValueFromMatch);

Replace takes a user-provided method, which is given each individual Match and spits out a string to concat as the result.

As you see below though, the vast majority of allocations aren't even string-related, but rather the Groups-related objects and internal arrays that Match creates the first time you access the Groups property.

// Match.cs
public virtual GroupCollection Groups => _groupcoll ??= new GroupCollection(this, null);

// GroupCollection.cs
private readonly Match _match;
private readonly Hashtable? _captureMap;

/// <summary>Cache of Group objects fed to the user.</summary>
private Group[]? _groups;

private Group GetGroupImpl(int groupnum)
{
    if (groupnum == 0)
    {
        return _match;
    }

    // Construct all the Group objects the first time GetGroup is called
    if (_groups is null)
    {
        _groups = new Group[_match._matchcount.Length - 1];
        for (int i = 0; i < _groups.Length; i++)
        {
            string groupname = _match._regex!.GroupNameFromNumber(i + 1);
            _groups[i] = new Group(_match.Text, _match._matches[i + 1], _match._matchcount[i + 1], groupname);
        }
    }

    return _groups[groupnum - 1];
}

This is basically impossible to avoid since neither .NET Framework or .NET Core provide an allocation-free way to access a single group that you're interested in, whether by name or ID, or reuse the related objects. The closest thing is a discussion to potentially introduce a ValueGroup object similar to ValueMatch that can be enumerated, but that appears to be a ways off.

This means in order to solve this, we have to find ways to avoid entering the Regex path entirely.
 
Taken from ExpandQuotedExpressionFunction on main:

image

Taken from MetadataMatchEvaluator on main:

image

Changes Made

Besides the core part of avoiding Regex.Replace() and manually iterating the match object, there are a few observations to make here:

  1. The vast majority of strings here are an exact match of %(ItemSpecModifier) - therefore we can avoid processing these at all by just doing a simple dictionary lookup.
private static readonly FrozenDictionary<string, string> s_itemSpecModifiers = new Dictionary<string, string>()
{
    [$"%{{{ItemSpecModifiers.FullPath}}}"] = ItemSpecModifiers.FullPath,
    [$"%{{{ItemSpecModifiers.RootDir}}}"] = ItemSpecModifiers.RootDir,
    /// ... ect.
}
  1. Many strings passed through here are identical to the previous run - therefore a simple single reference cache avoids another large set of lookups. This is "thread-safe" since 8-byte reads/writes are atomic + we always work on a local reference and just overwrite.
private static string s_lastParsedQuotedExpression;
  1. Many two-match cases are also just a pair of itemspec modifiers - therefore we can avoid accessing the expensive GroupCollection allocation by performing a lookup on each match iteration as well.

  2. Most matches are the single-match case. We can avoid the vast majority of collection / array allocations by wrapping the result in a discriminated union of "one-or-more" matches. In the full match case, we can additional avoid any string allocations as well.

private enum MetadataMatchType
{
    ExactString,
    Single,
    Multiple,
}

readonly struct OneOrMultipleMetadataMatches
{
    internal MetadataMatch Single { get; }

    internal List<MetadataMatch> Multiple { get; }

    internal MetadataMatchType Type { get; }
}

@Copilot Copilot AI review requested due to automatic review settings June 12, 2025 21:16
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR significantly improves the performance of quoted expression expansion by avoiding expensive Regex allocations and streamlining metadata resolution. Key changes include replacing the Regex.Replace call with manual matching and concatenation using optimized paths, introducing a FrozenDictionary for common item spec modifiers, and removing the legacy MetadataMatchEvaluator class.


/// <summary>
/// A precomputed lookup of item spec modifiers wrapped in regex strings.
/// This allows us to completely skip of Regex parsing when the innter string matches a known modifier.
Copy link
Preview

Copilot AI Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: 'innter' should be 'inner'.

Suggested change
/// This allows us to completely skip of Regex parsing when the innter string matches a known modifier.
/// This allows us to completely skip Regex parsing when the inner string matches a known modifier.

Copilot uses AI. Check for mistakes.

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am SO excited about this.

// If we're not a ProjectItem or ProjectItemInstance, then ProjectDirectory will be null.
// In that case, we're safe to get the current directory as we'll be running on TaskItems which
// only exist within a target where we can trust the current directory
string directoryToUse = sourceOfMetadata.ProjectDirectory ?? Directory.GetCurrentDirectory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[unrelated to this PR] @AR-May we'll have to rethink this assumption for threading.

{
return new OneOrMultipleMetadataMatches(string.Empty);
}
else if (s_itemSpecModifiers.TryGetValue(match.Value, out cachedName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you elaborate on this case in a comment? it's successful and it's . . . a single match, that isn't exactly one of the well-formed cases but is "close enough"? Is it like whitespace around the exact spelling so @(Foo->' %(Identity) ')?


// Now we run the full loop.
// This is a very hot path, so we avoid allocating this until after we know there are multiple matches.
List<MetadataMatch> multipleMatches = [new MetadataMatch(firstMatch, name)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment an example expression that falls to this case? It makes sense from the code but I'm having a hard time backing into what the XML looked like to get here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly stuff like %(Filename)%(Extension), but you get ones like:

Microsoft.NET.GenerateAssemblyInfo.targets (mix of known modifiers + other)

<Hash ItemsToHash="@(AssemblyAttribute->'%(Identity)%(_Parameter1)%(_Parameter2)%(_Parameter3)%(_Parameter4)%(_Parameter5)%(_Parameter6)%(_Parameter7)%(_Parameter8)')">

Microsoft.NET.Sdk.BeforeCommon.targets (string interpolation)

<Target Name="GenerateNETCompatibleDefineConstants">
...
    <_ImplicitDefineConstant Include="@(_FormattedCompatibleFrameworkVersions->'NETCOREAPP%(Identity)_OR_GREATER'->Replace('.', '_'))" />

private enum MetadataMatchType
{
ExactString,
Single,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
Single,
InexactSingle,

or something? and probably doc comments for these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants