-
-
Notifications
You must be signed in to change notification settings - Fork 32.6k
Avoid match looping in sourceMappingURL tag detection by using negative lookahead #58704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I can't disagree with code simplicity, do you have benchmarks? I mean, regexp is not the nicest thing to maintain |
I didn't run any benchmark. Going to do a PoC with iso-bench and post the result.
Yeah, that's true... 😔 |
EDIT: Invalid test. Rewriting... |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #58704 +/- ##
==========================================
- Coverage 90.16% 90.07% -0.09%
==========================================
Files 637 640 +3
Lines 188001 188287 +286
Branches 36881 36915 +34
==========================================
+ Hits 169509 169602 +93
- Misses 11238 11400 +162
- Partials 7254 7285 +31 🚀 New features to boost your workflow:
|
Done. Summary:
I'm not sure the amount of files with multiple comments that we are going to find. Still, can process from thousands to millions of files per second. I agree that this is making the regex more difficult to read, though. Negative lookaheads are tricky. I don't know how do you feel about this. I don't event know how I feel about this. In non-performance-critical places I prefer simpler code, and I'm not sure if processing thousands of sourcemaps per second is considered low performance in this context. I don't know how many projects with files with 2000+ lines and multiple sourcemap comments are around there. Still, if the project has 500 files like that (wich are a lot in my opinion), the performance difference is not going to be noticeable I think. Benchmark code: const { IsoBench } = require("iso-bench");
const negativeLookaheadRegex = /(\/[*/]#\s+sourceMappingURL=)(?<sourceMappingURL>[^\s]+)(?![\s\S]*\1)/;
const multipleMatchRegex = /\/[*/]#\s+sourceMappingURL=(?<sourceMappingURL>[^\s]+)/g;
const singleComment = {
"2lines": `test string yay ok
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"20lines": `${`test string yay ok\n`.repeat(19)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"200lines": `${`test string yay ok\n`.repeat(199)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"2000lines": `${`test string yay ok\n`.repeat(1990)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
const doubleComment = {
"2lines": `//# sourceMappingURL=g44tgbh463h643h6
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"20lines": `${new Array(19).fill(0).map((_, i) => {
return i % 10 === 9 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"200lines": `${new Array(199).fill(0).map((_, i) => {
return i % 100 === 99 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"2000lines": `${new Array(1999).fill(0).map((_, i) => {
return i % 1000 === 999 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
function matchNegativeLookahead(code) {
const match = negativeLookaheadRegex.exec(code);
if (match == null || !match.groups) {
return null;
}
return match.groups.sourceMappingURL;
}
function matchLoop(code) {
let match;
let lastMatch;
while ((match = multipleMatchRegex.exec(code))) {
lastMatch = match;
}
if (lastMatch == null || !lastMatch.groups) {
return null;
}
return lastMatch.groups.sourceMappingURL;
}
function add(bench, code) {
return bench.add("negative lookahead", () => {
return matchNegativeLookahead(code);
}).add("match loop", () => {
return matchLoop(code);
});
}
const bench = new IsoBench();
add(bench, singleComment["2lines"]).endGroup("Single comment: 2 lines");
add(bench, singleComment["20lines"]).endGroup("Single comment: 20 lines");
add(bench, singleComment["200lines"]).endGroup("Single comment: 200 lines");
add(bench, singleComment["2000lines"]).endGroup("Single comment: 2000 lines");
add(bench, doubleComment["2lines"]).endGroup("Multiple comment: 2 lines");
add(bench, doubleComment["20lines"]).endGroup("Multiple comment: 20 lines");
add(bench, doubleComment["200lines"]).endGroup("Multiple comment: 200 lines");
add(bench, doubleComment["2000lines"]).endGroup("Multiple comment: 2000 lines");
bench.consoleLog().run(); |
Ok, I tried with different approaches, and I think that I found a good balance between performance an code, as performance is good (even better with single and multiple comment lines), has less code, uses I tested:
Personally, If the reviewer is OK with this, I can push a commit with the EDIT: Added "no comment" benchmarks. Benchmark code: const { IsoBench } = require("iso-bench");
const negativeLookaheadRegex = /(\/[*/]#\s+sourceMappingURL=)(?<sourceMappingURL>[^\s]+)(?![\s\S]*\1)/;
const globalMatch = /\/[*/]#\s+sourceMappingURL=[^\s]+/g;
const multipleMatchRegex = /\/[*/]#\s+sourceMappingURL=(?<sourceMappingURL>[^\s]+)/g;
const noComment = {
"2lines": "test string yay ok\n".repeat(2),
"20lines": "test string yay ok\n".repeat(20),
"200lines": "test string yay ok\n".repeat(200),
"2000lines": "test string yay ok\n".repeat(2000)
};
const singleComment = {
"2lines": `test string yay ok
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"20lines": `${"test string yay ok\n".repeat(19)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"200lines": `${"test string yay ok\n".repeat(199)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"2000lines": `${"test string yay ok\n".repeat(1990)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
const doubleComment = {
"2lines": `//# sourceMappingURL=g44tgbh463h643h6
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"20lines": `${new Array(19).fill(0).map((_, i) => {
return i === 10 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"200lines": `${new Array(199).fill(0).map((_, i) => {
return i === 100 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"2000lines": `${new Array(1999).fill(0).map((_, i) => {
return i === 1000 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
// Test functions
function matchNegativeLookahead(code) {
const match = negativeLookaheadRegex.exec(code);
if (match == null || !match.groups) {
return null;
}
return match.groups.sourceMappingURL;
}
function matchLoop(code) {
let match;
let lastMatch;
while ((match = multipleMatchRegex.exec(code))) {
lastMatch = match;
}
if (lastMatch == null || !lastMatch.groups) {
return null;
}
return lastMatch.groups.sourceMappingURL;
}
function matchReduce(code) {
const match = code.matchAll(multipleMatchRegex).reduce((_, match) => match, null);
if (match == null || !match.groups) {
return null;
}
return match.groups.sourceMappingURL;
}
function matchArray(code) {
const match = Array.from(code.matchAll(multipleMatchRegex)).at(-1);
if (match == null || !match.groups) {
return null;
}
return match.groups.sourceMappingURL;
}
function matchGlobal(code) {
const match = code.match(globalMatch)?.at(-1);
if (match == null) {
return null;
}
return match.substring(match.indexOf("=") + 1);
}
// End test functions
function add(bench, code) {
return bench.add("match loop (original)", () => {
return matchLoop(code);
}).add("match global (new one I like)", () => {
return matchGlobal(code);
}).add("negative lookahead (meh on multiline)", () => {
return matchNegativeLookahead(code);
}).add("match reduce (meh)", () => {
return matchReduce(code);
}).add("match array (meh)", () => {
return matchArray(code);
});
}
const bench = new IsoBench();
add(bench, singleComment["2lines"]).endGroup("Single comment: 2 lines");
add(bench, singleComment["20lines"]).endGroup("Single comment: 20 lines");
add(bench, singleComment["200lines"]).endGroup("Single comment: 200 lines");
add(bench, singleComment["2000lines"]).endGroup("Single comment: 2000 lines");
add(bench, doubleComment["2lines"]).endGroup("Multiple comment: 2 lines");
add(bench, doubleComment["20lines"]).endGroup("Multiple comment: 20 lines");
add(bench, doubleComment["200lines"]).endGroup("Multiple comment: 200 lines");
add(bench, doubleComment["2000lines"]).endGroup("Multiple comment: 2000 lines");
add(bench, noComment["2lines"]).endGroup("No comment: 2 lines");
add(bench, noComment["20lines"]).endGroup("No comment: 20 lines");
add(bench, noComment["200lines"]).endGroup("No comment: 200 lines");
add(bench, noComment["2000lines"]).endGroup("No comment: 2000 lines");
bench.consoleLog().run(); |
7f929cf
to
dcd637a
Compare
I'm not sure what subsystem this applies to... |
I think maybe |
Could also use If you ever are unsure, i recommend looking at what other PRs used that previously modified this code. |
Ok, I'm going with lib as it was the very previous commit for these lines that I'm changing. |
3c44634
to
e034cec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These regular expressios are only applied to eval-ed sources. Would you mind adding a test like
{ name: 'source-map/output/source_map_eval.js' }, |
463084f
to
0e809a6
Compare
@legendecas I've modified the eval test. Do you think that's enough or you prefer a single file specific for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
0e809a6
to
af1d67b
Compare
@legendecas I noticed a problem with the negative lookahead regex. If you have mixed It also has more performance than the negative lookahead and simplified both the negative lookahead regex and the original exec loop code, so I think that is a better approach. What do you think? |
71ac3c4
to
b3bb2d8
Compare
Ok, at this point I don't know what is happening. A linter about prototype pollution fails with |
d28e076
to
007cb03
Compare
Checkout https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match#description for explaination about the prototype pollution (i.e. |
007cb03
to
2f9dbe2
Compare
Ahhh makes sense! Thank you. |
2f9dbe2
to
0eed088
Compare
0eed088
to
78040e3
Compare
Now it complains that RegExpPrototypeSymbolMatch looks up the exec property. I'm going to close the PR as I don't have time for this :-( |
I think that a single regex is better than looping a regex multiple times. The code is smaller, using const and such.