Skip to content

Avoid match looping in sourceMappingURL tag detection by using negative lookahead #58704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Llorx
Copy link
Contributor

@Llorx Llorx commented Jun 13, 2025

I think that a single regex is better than looping a regex multiple times. The code is smaller, using const and such.

@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Jun 13, 2025
@juanarbol
Copy link
Member

I can't disagree with code simplicity, do you have benchmarks? I mean, regexp is not the nicest thing to maintain

@Llorx
Copy link
Contributor Author

Llorx commented Jun 13, 2025

I can't disagree with code simplicity, do you have benchmarks?

I didn't run any benchmark. Going to do a PoC with iso-bench and post the result.

I mean, regexp is not the nicest thing to maintain

Yeah, that's true... 😔

@Llorx
Copy link
Contributor Author

Llorx commented Jun 13, 2025

EDIT: Invalid test. Rewriting...

Copy link

codecov bot commented Jun 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.07%. Comparing base (5fe7800) to head (78040e3).
Report is 37 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #58704      +/-   ##
==========================================
- Coverage   90.16%   90.07%   -0.09%     
==========================================
  Files         637      640       +3     
  Lines      188001   188287     +286     
  Branches    36881    36915      +34     
==========================================
+ Hits       169509   169602      +93     
- Misses      11238    11400     +162     
- Partials     7254     7285      +31     

see 72 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Llorx
Copy link
Contributor Author

Llorx commented Jun 13, 2025

Done. Summary:

  • A bit of improvement on single comments, and equalizing the more lines the file has (from 1.16x to 1.00x).
  • A lot of worsening on multiple comments, but getting a bit better the more lines the file has (from 17.82x to 14.49x).

I'm not sure the amount of files with multiple comments that we are going to find. Still, can process from thousands to millions of files per second.

I agree that this is making the regex more difficult to read, though. Negative lookaheads are tricky.

I don't know how do you feel about this. I don't event know how I feel about this. In non-performance-critical places I prefer simpler code, and I'm not sure if processing thousands of sourcemaps per second is considered low performance in this context. I don't know how many projects with files with 2000+ lines and multiple sourcemap comments are around there. Still, if the project has 500 files like that (wich are a lot in my opinion), the performance difference is not going to be noticeable I think.

Benchmark code:

const { IsoBench } = require("iso-bench");

const negativeLookaheadRegex = /(\/[*/]#\s+sourceMappingURL=)(?<sourceMappingURL>[^\s]+)(?![\s\S]*\1)/;
const multipleMatchRegex = /\/[*/]#\s+sourceMappingURL=(?<sourceMappingURL>[^\s]+)/g;

const singleComment = {
    "2lines": `test string yay ok
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "20lines": `${`test string yay ok\n`.repeat(19)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "200lines": `${`test string yay ok\n`.repeat(199)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "2000lines": `${`test string yay ok\n`.repeat(1990)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
const doubleComment = {
    "2lines": `//# sourceMappingURL=g44tgbh463h643h6
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "20lines": `${new Array(19).fill(0).map((_, i) => {
        return i % 10 === 9 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
    }).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "200lines": `${new Array(199).fill(0).map((_, i) => {
        return i % 100 === 99 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
    }).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "2000lines": `${new Array(1999).fill(0).map((_, i) => {
        return i % 1000 === 999 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
    }).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};

function matchNegativeLookahead(code) {
    const match = negativeLookaheadRegex.exec(code);
    if (match == null || !match.groups) {
        return null;
    }
    return match.groups.sourceMappingURL;
}
function matchLoop(code) {
    let match;
    let lastMatch;
    while ((match = multipleMatchRegex.exec(code))) {
        lastMatch = match;
    }
    if (lastMatch == null || !lastMatch.groups) {
        return null;
    }
    return lastMatch.groups.sourceMappingURL;
}
function add(bench, code) {
    return bench.add("negative lookahead", () => {
        return matchNegativeLookahead(code);
    }).add("match loop", () => {
        return matchLoop(code);
    });
}
const bench = new IsoBench();
add(bench, singleComment["2lines"]).endGroup("Single comment: 2 lines");
add(bench, singleComment["20lines"]).endGroup("Single comment: 20 lines");
add(bench, singleComment["200lines"]).endGroup("Single comment: 200 lines");
add(bench, singleComment["2000lines"]).endGroup("Single comment: 2000 lines");
add(bench, doubleComment["2lines"]).endGroup("Multiple comment: 2 lines");
add(bench, doubleComment["20lines"]).endGroup("Multiple comment: 20 lines");
add(bench, doubleComment["200lines"]).endGroup("Multiple comment: 200 lines");
add(bench, doubleComment["2000lines"]).endGroup("Multiple comment: 2000 lines");
bench.consoleLog().run();

Result:
image

@Llorx
Copy link
Contributor Author

Llorx commented Jun 14, 2025

Ok, I tried with different approaches, and I think that I found a good balance between performance an code, as performance is good (even better with single and multiple comment lines), has less code, uses const and avoids explicit loops.

I tested:

  • matchNegativeLookahead: Using a negative lookahead with const.
  • matchLoop: The original one using exec loops and let.
  • matchReduce: Using matchAll and reduce to get the last element with const.
  • matchArray: Using matchAll, Array.from and at to get the last element with const.
  • matchGlobal: Using a global match and at to get the last element with const, and then indexOf to substring the URL part.

Personally, matchGlobal is the one I like the most, as has no explicit loops, less code, uses const and has a good performance.

If the reviewer is OK with this, I can push a commit with the matchGlobal code.

EDIT: Added "no comment" benchmarks.

Benchmark code:

const { IsoBench } = require("iso-bench");

const negativeLookaheadRegex = /(\/[*/]#\s+sourceMappingURL=)(?<sourceMappingURL>[^\s]+)(?![\s\S]*\1)/;
const globalMatch = /\/[*/]#\s+sourceMappingURL=[^\s]+/g;
const multipleMatchRegex = /\/[*/]#\s+sourceMappingURL=(?<sourceMappingURL>[^\s]+)/g;

const noComment = {
    "2lines": "test string yay ok\n".repeat(2),
    "20lines": "test string yay ok\n".repeat(20),
    "200lines": "test string yay ok\n".repeat(200),
    "2000lines": "test string yay ok\n".repeat(2000)
};
const singleComment = {
    "2lines": `test string yay ok
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "20lines": `${"test string yay ok\n".repeat(19)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "200lines": `${"test string yay ok\n".repeat(199)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "2000lines": `${"test string yay ok\n".repeat(1990)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
const doubleComment = {
    "2lines": `//# sourceMappingURL=g44tgbh463h643h6
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "20lines": `${new Array(19).fill(0).map((_, i) => {
        return i === 10 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
    }).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "200lines": `${new Array(199).fill(0).map((_, i) => {
        return i === 100 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
    }).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,

    "2000lines": `${new Array(1999).fill(0).map((_, i) => {
        return i === 1000 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
    }).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};

// Test functions
function matchNegativeLookahead(code) {
    const match = negativeLookaheadRegex.exec(code);
    if (match == null || !match.groups) {
        return null;
    }
    return match.groups.sourceMappingURL;
}
function matchLoop(code) {
    let match;
    let lastMatch;
    while ((match = multipleMatchRegex.exec(code))) {
        lastMatch = match;
    }
    if (lastMatch == null || !lastMatch.groups) {
        return null;
    }
    return lastMatch.groups.sourceMappingURL;
}
function matchReduce(code) {
    const match = code.matchAll(multipleMatchRegex).reduce((_, match) => match, null);
    if (match == null || !match.groups) {
        return null;
    }
    return match.groups.sourceMappingURL;
}
function matchArray(code) {
    const match = Array.from(code.matchAll(multipleMatchRegex)).at(-1);
    if (match == null || !match.groups) {
        return null;
    }
    return match.groups.sourceMappingURL;
}
function matchGlobal(code) {
    const match = code.match(globalMatch)?.at(-1);
    if (match == null) {
        return null;
    }
    return match.substring(match.indexOf("=") + 1);
}
// End test functions

function add(bench, code) {
    return bench.add("match loop (original)", () => {
        return matchLoop(code);
    }).add("match global (new one I like)", () => {
        return matchGlobal(code);
    }).add("negative lookahead (meh on multiline)", () => {
        return matchNegativeLookahead(code);
    }).add("match reduce (meh)", () => {
        return matchReduce(code);
    }).add("match array (meh)", () => {
        return matchArray(code);
    });
}
const bench = new IsoBench();
add(bench, singleComment["2lines"]).endGroup("Single comment: 2 lines");
add(bench, singleComment["20lines"]).endGroup("Single comment: 20 lines");
add(bench, singleComment["200lines"]).endGroup("Single comment: 200 lines");
add(bench, singleComment["2000lines"]).endGroup("Single comment: 2000 lines");
add(bench, doubleComment["2lines"]).endGroup("Multiple comment: 2 lines");
add(bench, doubleComment["20lines"]).endGroup("Multiple comment: 20 lines");
add(bench, doubleComment["200lines"]).endGroup("Multiple comment: 200 lines");
add(bench, doubleComment["2000lines"]).endGroup("Multiple comment: 2000 lines");
add(bench, noComment["2lines"]).endGroup("No comment: 2 lines");
add(bench, noComment["20lines"]).endGroup("No comment: 20 lines");
add(bench, noComment["200lines"]).endGroup("No comment: 200 lines");
add(bench, noComment["2000lines"]).endGroup("No comment: 2000 lines");
bench.consoleLog().run();

Benchmark results:
image

@Llorx Llorx force-pushed the single-match-sourcemap branch from 7f929cf to dcd637a Compare June 14, 2025 16:36
@Llorx
Copy link
Contributor Author

Llorx commented Jun 14, 2025

I'm not sure what subsystem this applies to...

@Ethan-Arrowood
Copy link
Contributor

I think maybe src: is fine.

@Ethan-Arrowood
Copy link
Contributor

Could also use process:

If you ever are unsure, i recommend looking at what other PRs used that previously modified this code.

@Llorx
Copy link
Contributor Author

Llorx commented Jun 16, 2025

Ok, I'm going with lib as it was the very previous commit for these lines that I'm changing.

@Llorx Llorx force-pushed the single-match-sourcemap branch 2 times, most recently from 3c44634 to e034cec Compare June 16, 2025 19:34
Copy link
Member

@legendecas legendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regular expressios are only applied to eval-ed sources. Would you mind adding a test like

{ name: 'source-map/output/source_map_eval.js' },
for multiple magic comments? Thanks

@Llorx Llorx force-pushed the single-match-sourcemap branch from 463084f to 0e809a6 Compare June 19, 2025 17:57
@Llorx
Copy link
Contributor Author

Llorx commented Jun 19, 2025

@legendecas I've modified the eval test. Do you think that's enough or you prefer a single file specific for this?

Copy link
Member

@legendecas legendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Llorx Llorx force-pushed the single-match-sourcemap branch from 0e809a6 to af1d67b Compare June 21, 2025 06:58
@Llorx
Copy link
Contributor Author

Llorx commented Jun 21, 2025

@legendecas I noticed a problem with the negative lookahead regex. If you have mixed //# sourceMappingURL= and /* sourceMappingURL= comments it will fail, so instead of explaining the regexes, I went by simplifying the regexes and using the matchGlobal explained here #58704 (comment) which doesn't have that problem.

It also has more performance than the negative lookahead and simplified both the negative lookahead regex and the original exec loop code, so I think that is a better approach. What do you think?

@Llorx Llorx force-pushed the single-match-sourcemap branch 2 times, most recently from 71ac3c4 to b3bb2d8 Compare June 21, 2025 07:08
@Llorx
Copy link
Contributor Author

Llorx commented Jun 21, 2025

Ok, at this point I don't know what is happening. A linter about prototype pollution fails with StringPrototypeMatch and I'm not sure why as I'm using the primordial like all the other StringPrototypeX.

@Llorx Llorx force-pushed the single-match-sourcemap branch 2 times, most recently from d28e076 to 007cb03 Compare June 21, 2025 07:33
@legendecas
Copy link
Member

Ok, at this point I don't know what is happening. A linter about prototype pollution fails with StringPrototypeMatch and I'm not sure why as I'm using the primordial like all the other StringPrototypeX.

Checkout https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match#description for explaination about the prototype pollution (i.e. RegExp.prototype[Symbol.match]() in this case). An alternative to StringPrototypeMatch is using RegExpPrototypeSymbolMatch directly

@Llorx Llorx force-pushed the single-match-sourcemap branch from 007cb03 to 2f9dbe2 Compare June 22, 2025 18:14
@Llorx
Copy link
Contributor Author

Llorx commented Jun 22, 2025

Checkout https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match#description for explaination about the prototype pollution (i.e. RegExp.prototype[Symbol.match]() in this case). An alternative to StringPrototypeMatch is using RegExpPrototypeSymbolMatch directly

Ahhh makes sense! Thank you.

@Llorx Llorx force-pushed the single-match-sourcemap branch from 2f9dbe2 to 0eed088 Compare June 22, 2025 18:20
@Llorx Llorx force-pushed the single-match-sourcemap branch from 0eed088 to 78040e3 Compare June 22, 2025 18:24
@Llorx
Copy link
Contributor Author

Llorx commented Jun 22, 2025

Now it complains that RegExpPrototypeSymbolMatch looks up the exec property. I'm going to close the PR as I don't have time for this :-(

@Llorx Llorx closed this Jun 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants