feat: use nh3 for HTML sanitization#10264
feat: use nh3 for HTML sanitization#10264nouralmaa wants to merge 22 commits intoietf-tools:mainfrom
Conversation
ietf/utils/html.py
Outdated
| # Allow the protocols/tags/attributes we specifically want, plus anything that nh3 declares | ||
| # to be safe. | ||
|
|
||
| acceptable_protocols = nh3.ALLOWED_URL_SCHEMES.union( |
There was a problem hiding this comment.
TODO check. this is more permissive than bleach's strict superset. however, unsure if this adds extra vulnerabilities since nh3 uses a better maintained parser https://github.com/rust-ammonia/ammonia
There was a problem hiding this comment.
@jennifer-richards @rjsparks any thoughts?
There was a problem hiding this comment.
I'm going to try to get a few other eyes on this
There was a problem hiding this comment.
I think adding tel: and ftp: (really?) are fine from a security perspective.
There was a problem hiding this comment.
We don't really need ftp, but tel might be useful.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10264 +/- ##
==========================================
- Coverage 88.57% 88.39% -0.18%
==========================================
Files 317 325 +8
Lines 42544 43544 +1000
==========================================
+ Hits 37682 38492 +810
- Misses 4862 5052 +190 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ietf/utils/html.py
Outdated
| "li", "ol", "p", "pre", "q", "s", "samp", "small", "span", "strike", "style", | ||
| "li", "ol", "p", "pre", "q", "s", "samp", "small", "span", "strike", | ||
| "strong", "sub", "sup", "table", "title", "tbody", "td", "tfoot", "th", "thead", | ||
| "tr", "tt", "u", "ul", "var" |
There was a problem hiding this comment.
xmp is also good
| "tr", "tt", "u", "ul", "var" | |
| "tr", "tt", "u", "ul", "var", "xmp" |
ietf/utils/html.py
Outdated
| # to be safe. | ||
|
|
||
| acceptable_protocols = nh3.ALLOWED_URL_SCHEMES.union( | ||
| {"http", "https", "mailto", "ftp", "xmpp"} |
There was a problem hiding this comment.
| {"http", "https", "mailto", "ftp", "xmpp"} | |
| {"ftp", "http", "https", "mailto", "tel", "xmpp"} |
Sort. Add "tel".
(If performance is dictated by order, move "https" to the top.)
ietf/utils/html.py
Outdated
| protocols=acceptable_protocols, | ||
| strip=True, | ||
| url_schemes=acceptable_protocols, | ||
| link_rel=None |
There was a problem hiding this comment.
This doesn't seem right to me. The default value is safer. Are there cases where links opened will need window.opener? I can't imagine that being necessary for user-generated content.
Same below.
There was a problem hiding this comment.
cool, changed it here as there were no rel attributes before
ietf/utils/html.py
Outdated
| protocols=acceptable_protocols, | ||
| strip=True, | ||
| _liberal_nh3_cleaner = nh3.Cleaner( | ||
| tags=acceptable_tags.union({"mg", "figure", "figcaption"}), |
There was a problem hiding this comment.
typo?
| tags=acceptable_tags.union({"mg", "figure", "figcaption"}), | |
| tags=acceptable_tags.union({"img", "figure", "figcaption"}), |
| """Returns the given HTML sanitized, and with the given tags removed.""" | ||
| allowed = acceptable_tags - set(t.lower() for t in tags) | ||
| return bleach.clean(html, tags=allowed, strip=True) | ||
| return nh3.clean(html, tags=allowed) |
There was a problem hiding this comment.
You might want to audit invocations of this function.
821b2a9 to
bf72278
Compare
|
Hi @nouralmaa - Thanks for continuing to work on this. One point on something that's not obvious - commits bf72278 and 9736b74 touch an old data migration (see the filename) - those will never run again, and its better to not touch these migration files as they should show what happened when they actually ran on the production database. If it's easy, please revert those commits. If it's not, we'll do it when this gets towards final review. When we have major changes to django (we may have one coming with django5) we tend to squash the migrations and that one would go away at that time. |
| '<a href="https://www.ietf.org" rel="nofollow">https://www.ietf.org</a>', | ||
| ) | ||
| self.assertEqual( | ||
| linkify("https://mailman3.ietf.org/mailman3/lists/[email protected]/"), |

fixes #10138