-
-
Notifications
You must be signed in to change notification settings - Fork 2
fix: address CVE ReDoS issue #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ensures proper formatting by adding a newline at the end of the file. This change helps maintain consistency and adheres to coding standards. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
Improve the regex logic to handle custom fragments more robustly by accounting for nested occurrences. This ensures better accuracy when temporarily replacing ignored fragments. [References: kangax/html-minifier#1135] (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
Update package.json and package-lock.json to reflect the version change from 1.0.1 to 1.0.2. This ensures consistency across the project metadata. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
WalkthroughThis update adds two new CLI options to limit input length and quantifier repetition to prevent ReDoS attacks. The HTML minifier enforces these limits by checking input size and bounding regex quantifiers for custom fragments. The package version is bumped to 1.1.0, new tests verify ReDoS protection, and documentation is updated accordingly. Changes
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
npm error Exit handler never called! ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (3)
cli.js
(1 hunks)package.json
(1 hunks)src/htmlminifier.js
(1 hunks)
🔇 Additional comments (2)
cli.js (1)
307-307
: LGTM! Good formatting practice.Adding a newline at the end of the file follows standard formatting conventions and improves file consistency.
package.json (1)
93-93
: Appropriate version bump for security patch.The patch version increment (1.0.1 → 1.0.2) correctly follows semantic versioning for a backward-compatible security fix.
Introduce input length validation to mitigate ReDoS risks in the custom fragments regex. Long inputs now fall back to a simpler, safer pattern. Added tests to ensure performance and correctness under various edge cases. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
Used Claude Code also to explain the “reasoning” behind the updated solution:
|
@DanielRuf, do you have a view on this? Would love to take you up on your offer to provide input! (I think this works, but do appreciate another opinion.) |
src/htmlminifier.js
Outdated
@@ -888,8 +888,19 @@ async function minifyHTML(value, options, partialMarkup) { | |||
return re.source; | |||
}); | |||
if (customFragments.length) { | |||
const reCustomIgnore = new RegExp('\\s*(?:' + customFragments.join('|') + ')+\\s*', 'g'); | |||
// temporarily replace custom ignored fragments with unique attributes | |||
// Safe approach: Use original pattern but with input length validation to prevent ReDoS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this. The original pattern is used three times here. Is this, how it should be?
A quick check with https://devina.io/redos-checker? should show you some possible ways to verify, if this resolves the aforementioned problem.
Using the original string and replacing *
with {0,30}
and +
with {1,30}
(just as demonstration) already makes a difference:
But if I set these lower and upper bounds for the expression in line 898 in these changes, it does not seem to resolve that:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the AI made it worse by simply using capturing groups without any reasnoable quantifiers and by using a normal capturing group instead of a non-capturing group.
In pseudo language, it seems to be this now:
If value.length > maxSafeLength
Then ""unsafeRegEx"
Else "duplicatedOriginalUnsafeRegex"
And the duplicatedOriginalUnsafeRegex becomes this:
After replacing the unlimited quantifiers, the checker itself runs into a timeout:
Only after setting some lower limits the checker says at least, that this might not be vulnerable:
So now it's more complex than before. At least any +
and *
is a way to reintroduce ReDos in this part of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanielRuf, thank you so much!
I think the more complicated regex was a mistake—reverted to use the original regex indeed.
Does the reasoning hold that the issue is mostly one for large inputs? Then I do believe that to work (at least while lacking an alternative for all inputs)?
(Had run a few attempts here… not my usual cup of tea so I did use Claude Code here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of a reasonable length value. Especially since this is a bit hard to define.
My suggestion at the moment:
- change the default
customFragments
in the settings to a variant which is not vulnerable (with reasonable lower and upper limits ({0, upperLimit}
and{1, upperLimit}
) instead of+
and*
) to ReDoS - replace the other instances of
*
and+
with reasonable defaults, which can be configured ({0, upperLimit}
and{1, upperLimit}
) - add a warning to the documentation that any changed
customFragments
should be checked for ReDoS (in the end we have no control over what users define there) - maybe the code can also warn here or stop using the provided regular expression if it contains
+
and*
and continue without it - add a new option for general input length limitation but make it disabled by default to not break current setups
- on further iterations / versions implement an alternative solution which uses an array with the following logic:
- find all occurrences of the start tag (https://stackoverflow.com/questions/20798477/how-to-find-the-indexes-of-all-occurrences-of-an-element-in-array#20798567)
- find all occurrences of the end tag (https://stackoverflow.com/questions/20798477/how-to-find-the-indexes-of-all-occurrences-of-an-element-in-array#20798567)
- iterate over the positions and extract the parts between them from the input with https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substring or slice (see the difference at https://stackoverflow.com/questions/2243824/what-is-the-difference-between-string-slice-and-string-substring)
- replace these from the input (not sure what exactly is the current logic) for minification / optimization with something else
- add the fragments and their content back
- deprecate the old logic and settings
That should result in a more linear or predictable processing time instead of exponential growth (due to catastrophic backtracking for example) in processing due to problematic regular expressions with unlimited quantifiers like +
and *
and the resulting compiled automatons.
Further resources on this topic:
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://en.wikipedia.org/wiki/ReDoS
In general the best approach is often to not use regular expressions, when user-supplied input is accepted but to use finite and deterministic linear string manipulations.
Does that make sense?
Simplified the regular expression used for normal inputs to improve efficiency and maintainability. The updated regex is less prone to potential issues while retaining the necessary functionality. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
…, improve docs and tests\n\n- Use bounded quantifiers in default ignoreCustomFragments\n- Add maxInputLength option and warnings\n- Warn on unsafe custom regexes\n- Update docs and CLI\n- Add/extend tests for ReDoS prevention
Update package.json and package-lock.json to reflect the new version 1.1.0. This prepares for the release with any included changes. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
Add input length validation and bounded quantifiers to prevent ReDoS vulnerabilities in HTML minification. Warn users about potential risks of using unlimited quantifiers in custom fragments. This improves the security and stability of the minification process. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
Introduce `customFragmentQuantifierLimit` and `maxInputLength` to improve security by mitigating ReDoS attack risks. These options provide limits for regex quantifiers and input length. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
Updated the README to include new security-related options (`customFragmentQuantifierLimit` and `maxInputLength`) and detailed explanations on ReDoS protection. Added a dedicated "Security" section for clarity and user guidance. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
I think this needs more changes for the regular expressions. https://www.rexegg.com/regex-quantifiers.php#lazy_solution might contain some recommendations, but I think the way forward is a completely different approach as previously outlined. To be fair, I never used the lazy solution ( According to https://www.rexegg.com/regex-explosive-quantifiers.php even double-digit upper bounds can be problematic. Which just confirms that regular expressions for this project are not the ideal solution. |
You were quick, @DanielRuf 🙂 This should still be an improvement though, no? Reviewing your feedback, looking into a different approach… (Thank you…!) |
Signed-off-by: Jens Oliver Meiert <[email protected]>
@DanielRuf, worked on this for a few more hours and just stashed some more changes (#23), but may need to table this for the time being. My read is that this PR does contain an improvement—would I be mistaken? If not, I’d merge this and look into other improvements later. My impression is that this needs a major rewrite. |
Probably, yes.
Should be a bit better than before.
I totally agree. |
Updated the headings to follow a consistent format and restructured the security section for clarity and readability. Minor adjustments were made to improve information accessibility and invite contributions for further improvements. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
Lowered the `customFragmentQuantifierLimit` default from 1000 to 200 for improved protection against regular expression denial of service (ReDoS) attacks. Updated relevant documentation and CLI descriptions to reflect this change. Added detailed examples and enhanced guidance for defining safe custom fragment patterns. (This commit message was AI-generated.) Signed-off-by: Jens Oliver Meiert <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
README.md (2)
96-96
: DocumentmaxInputLength
option
The newmaxInputLength
entry accurately reflects the default and purpose. Consider noting in the description that the default is “unlimited” when unset (undefined
).
159-171
: Add Security / ReDoS protection section
Clear security section with quantifier limits, input length caps, and pattern detection. For consistency, mention the default formaxInputLength
here (“disabled by default”) to match the quantifier bullet.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
README.md
(4 hunks)cli.js
(3 hunks)src/htmlminifier.js
(2 hunks)tests/minifier.spec.js
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- cli.js
- src/htmlminifier.js
- tests/minifier.spec.js
🧰 Additional context used
🪛 LanguageTool
README.md
[misspelling] ~79-~79: This word is normally spelled with a hyphen.
Context: ...| caseSensitive
| Treat attributes in case sensitive manner (useful for custom HTML tags) | ...
(EN_COMPOUNDS_CASE_SENSITIVE)
[style] ~173-~173: Consider an alternative to strengthen your wording.
Context: ...se can lead to ReDoS vulnerabilities. (Further improvements are needed. Contributions welcome.) ##...
(IMPROVEMENTS_REFINEMENTS)
🔇 Additional comments (3)
README.md (3)
3-3
: Update README badge to npm version
Replacing the version badge with the npm shield is great for live status.
81-81
: DocumentcustomFragmentQuantifierLimit
option
The new table entry correctly lists the quantifier limit option and default; good clarity and default value.
190-196
: Potentially unsafe patterns examples
The unsafe patterns block is comprehensive and includes a closing fence; good coverage of edge cases.
ignoreCustomFragments: [ | ||
/<%[\s\S]{0,1000}?%>/, // JSP/ASP with explicit bounds | ||
/<\?php[\s\S]{0,5000}?\?>/, // PHP with bounds | ||
/\{\{[^}]{0,500}\}\}/ // Handlebars without nested braces | ||
] | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing closing code fence for safe patterns block
The safe patterns example starts with “javascript” but lacks the corresponding closing “
”, which will break Markdown rendering.
Apply:
/\{\{[^}]{0,500}\}\}/ // Handlebars without nested braces
]
+ ```
🤖 Prompt for AI Agents
In README.md around lines 180 to 185, the code block showing safe patterns
starts with a triple backtick but is missing the closing triple backtick. Add
the closing triple backtick ``` after the last line of the code block to
properly close it and fix Markdown rendering.
```javascript | ||
// Handlebars/Mustache | ||
ignoreCustomFragments: [/\{\{[\s\S]{0,1000}?\}\}/] | ||
|
||
// Liquid (Jekyll) | ||
ignoreCustomFragments: [/\{%[\s\S]{0,500}?%\}/, /\{\{[\s\S]{0,500}?\}\}/] | ||
|
||
// Angular | ||
ignoreCustomFragments: [/\{\{[\s\S]{0,500}?\}\}/] | ||
|
||
// Vue.js | ||
ignoreCustomFragments: [/\{\{[\s\S]{0,500}?\}\}/] | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing closing code fence for template engine examples
The template-engine config block begins with “```javascript” but is missing the closing triple backticks at the end.
Apply:
ignoreCustomFragments: [/\{\{[\s\S]{0,500}?\}\}/]
// Vue.js
ignoreCustomFragments: [/\{\{[\s\S]{0,500}?\}\}/]
+ ```
🤖 Prompt for AI Agents
In README.md around lines 200 to 212, the code block showing template engine
examples starts with triple backticks and "javascript" but lacks the closing
triple backticks at the end. Add the missing closing triple backticks after the
last line of the code block to properly close the fenced code block.
Thanks again, @DanielRuf, for your help! (I’ve just added a little acknowledgments section as well, with #27.) I’ll focus on testing this change, pulling up some more dependencies, and looking into other improvements, but would look forward to receiving your advice and collaborating again 🙏 |
Hi @DanielRuf—I totally didn’t make the connection between you and html-minifier-terser! I had reached out to various people there, wondering if and when html-minifier-terser would be maintained again. But if I understand comments there correctly, there are no such plans? |
I can neither confirm nor deny. To be frankly clear, I am not an active GitHub user anymore and I mainly just react to mentions if really needed. So I am not involved in this project anymore. Currently I have no maintainer status (currently I am just a normal external contributor, not collaborator or maintainer) and left the terser org on GitHub on my own. You can see the current status at the top right in the comments made by me: ![]() These two users (who were previously invited and added to the project by me as you can see at terser#79) have the needed rights to maintain the project and I can see that they replied / reacted to you at terser#197: ![]() ![]() I think they are open to get things forward with your help. More important things in life always happen, so any help by contributors is always very welcome. Maybe this clears things up a bit and you can work together with these two, resulting in new releases and you joining them as new additional maintainer. |
Thanks for sharing more context, @DanielRuf—much appreciated! Would be curious to learn what your focus is on now, btw—very open to continuing per email (and all good if this isn’t the time). |
@j9t I'm focusing now on IT security and Ethical Hacking (especially creating a new blog around this topic to show how real ethical hackers like myself work - not like most of the begbounty hunters). |
Would be curious to learn more, and follow that work! Also open to feature posts on Frontend Dogma—feel free to hit me up at any time! |
@j9t thanks for your interest. I can only be reached via Threema: https://threema.id/74SF7MW6?text= |
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests
Documentation