-
Notifications
You must be signed in to change notification settings - Fork 116
format: ensure ascii in uri, uri-reference #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: boon
Are you sure you want to change the base?
format: ensure ascii in uri, uri-reference #226
Conversation
* Add a test case for uri validation. * Change validateURI() to additionally check that all chars are ASCII.
* add a code branch for iri and uri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bernhardreiter Maybe you should add a iri.json
as test file as well.
I did, but forgot to add it to git. 🤦 Guess it makes sense to fix this for uri-reference as well. |
* Unify uri, uri-reference, iri, iri-reference in one validation function. * Move test for illegal `\` into the unified function for uri, uri-reference. * Add some more test cases.
* check for more illegal chars
// [..] The ABNF notation defines its terminal values to be | ||
// non-negative integers (codepoints) based on the US-ASCII coded | ||
// character set | ||
for _, r := range s { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use if strings.ContainsFunc(s, func(r rune) bool { return r > unicode.MaxASCII }) {
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried it, I'll find:
if strings.ContainsFunc(s, func(r rune) bool { return r > unicode.MaxASCII }) {
return LocalizableError("has unescaped non-ASCII characters")
}
slightly less readable (but this may be a matter of not being accustomed to it) than your original code from gocsaf/csaf#517 . Is it more efficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@s-l-teichmann told me that the new code is better in two regards: it works on runes, not on bytes only and it matches a common go style that has come up in the last 1.5 years.
So I'll improve the pull request at this point.
only check of non-ascii; if some one wants strict checking let them implement it; the changes proposed are adding so many functions and confusing the the readability and change the initializers as below:
|
only checking for non-ascii would remove a check for an illegal
(We would like better checking, yes.)
One function is removed, so instead of two larger functions there is one with four code paths. And four selector functions. I can see that the names are too close to each other. Adding only |
from the point this library implementation: uri, iri validations are same, similarly urn-reference, ire-reference validations are same. the intension of this PR is to add additional validation for uri, and uri-reference that all characters must be ascii. so just adding it seems you are confused that |
I know that this has been the case, but as RFC 3986 and 3987 are different and https://json-schema.org/draft/2020-12/json-schema-validation#name-resource-identifiers points to them specifically for uri and iri. So a correct validation must take the differences into account. URIs can only consist out of ASCII chars and in addition some ASCII chars are not allowed (by the ABNF specification). Go's url.Parse does check for a number of invalid chars (like control characters), but not for the chars I've listed in the PR. The code before was inconsistent in that only one invalid char was tested ( Considering the most correct, consistent and without much duplication code, brought me to the current revision. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
seems there is some confusion. I am talking in context of only this PR. This PR is just to add ascii check. if you want other validations do it another PR. also we are not going into the rabbit hole of implementing entire RFC. if that is the case I suggest to implement as separate library and plugin the validation. my intension is to make atomic change which just addresses the current PR. |
@santhosh-tekuri thanks for the further explanation. I will think about how to split up the changes - even it makes the between two changes less consistent. Can you also clarify:
Disallowing illegal ASCII characters in format |
we already have if you are willing I suggest to create PR to https://github.com/json-schema-org/JSON-Schema-Test-Suite to add your extra tests. |
I submitted a pull request with the additional tests: json-schema-org/JSON-Schema-Test-Suite#778. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a test file in the json schema test suite: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/48461fc3568972801b40eaccc428a31bce338f6e/tests/draft-next/optional/format/iri-reference.json. Maybe we should use this as it contains more tests.
3a6ebaa
to
74428e2
Compare
resolve #225