-
Notifications
You must be signed in to change notification settings - Fork 701
feat: use nh3 for HTML sanitization #10264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
0330e40
806448b
91e3410
bb4b6d3
2c7704b
a0665ac
356cedb
6042400
1aeb662
31c4153
9f0a249
4136809
c39659b
f2de311
153b742
90c47a7
fedd691
bf72278
9736b74
cbbd096
9cc2773
426cbaa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -4,7 +4,7 @@ | |||||
| """Utilities for working with HTML.""" | ||||||
|
|
||||||
|
|
||||||
| import bleach | ||||||
| import nh3 | ||||||
| import html2text | ||||||
|
|
||||||
| import debug # pyflakes:ignore | ||||||
|
|
@@ -15,63 +15,60 @@ | |||||
| from ietf.utils.mime import get_mime_type | ||||||
|
|
||||||
|
|
||||||
| # Allow the protocols/tags/attributes we specifically want, plus anything that bleach declares | ||||||
| # to be safe. As of 2025-01-27, the explicit lists for protocols and tags are a strict superset | ||||||
| # of bleach's defaults. | ||||||
| acceptable_protocols = bleach.sanitizer.ALLOWED_PROTOCOLS.union( | ||||||
| # Allow the protocols/tags/attributes we specifically want, plus anything that nh3 declares | ||||||
| # to be safe. | ||||||
|
|
||||||
| acceptable_protocols = nh3.ALLOWED_URL_SCHEMES.union( | ||||||
| {"http", "https", "mailto", "ftp", "xmpp"} | ||||||
|
||||||
| {"http", "https", "mailto", "ftp", "xmpp"} | |
| {"ftp", "http", "https", "mailto", "tel", "xmpp"} |
Sort. Add "tel".
(If performance is dictated by order, move "https" to the top.)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right to me. The default value is safer. Are there cases where links opened will need window.opener? I can't imagine that being necessary for user-generated content.
Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, changed it here as there were no rel attributes before
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
| tags=acceptable_tags.union({"mg", "figure", "figcaption"}), | |
| tags=acceptable_tags.union({"img", "figure", "figcaption"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to audit invocations of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if @martinthomson did so himself, but it looks to me like the only use is via the removetags filter in htmlfilters.py, which is not used anywhere (I also checked the DBTemplates)
If so, we can just lose this method entirely.
If not, I think this changes remove_... to escape_... because nh3 doesn't have a strip option (but that's just based on reading docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO check. this is more permissive than bleach's strict superset. however, unsure if this adds extra vulnerabilities since nh3 uses a better maintained parser https://github.com/rust-ammonia/ammonia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jennifer-richards @rjsparks any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to try to get a few other eyes on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding
tel:andftp:(really?) are fine from a security perspective.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need ftp, but tel might be useful.