Skip to content

format: ensure ascii in uri, uri-reference #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: boon
Choose a base branch
from

Conversation

bernhardreiter
Copy link

@bernhardreiter bernhardreiter commented Jun 12, 2025

  • Add test cases for uri and iri validation.
  • Change validateURI() to additionally check that all chars are ASCII.
  • Add additionally check for disallowed ASCII chars for uri and uri-reference.
  • Introduce validateIRI() without ASCII range check

resolve #225

  * Add a test case for uri validation.
  * Change validateURI() to additionally check that all chars are ASCII.
 * add a code branch for iri and uri
Copy link

@tschmidtb51 tschmidtb51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernhardreiter Maybe you should add a iri.json as test file as well.

@bernhardreiter
Copy link
Author

Maybe you should add a iri.json as test file as well.

I did, but forgot to add it to git. 🤦
Thanks for noticing!

Guess it makes sense to fix this for uri-reference as well.

 * Unify uri, uri-reference, iri, iri-reference in one validation
   function.
 * Move test for illegal `\` into the unified function for uri, uri-reference.
 * Add some more test cases.
  * check for more illegal chars
// [..] The ABNF notation defines its terminal values to be
// non-negative integers (codepoints) based on the US-ASCII coded
// character set
for _, r := range s {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use if strings.ContainsFunc(s, func(r rune) bool { return r > unicode.MaxASCII }) { instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it, I'll find:

        if strings.ContainsFunc(s, func(r rune) bool { return r > unicode.MaxASCII }) {
            return LocalizableError("has unescaped non-ASCII characters")
        }

slightly less readable (but this may be a matter of not being accustomed to it) than your original code from gocsaf/csaf#517 . Is it more efficient?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-l-teichmann told me that the new code is better in two regards: it works on runes, not on bytes only and it matches a common go style that has come up in the last 1.5 years.

So I'll improve the pull request at this point.

@santhosh-tekuri
Copy link
Owner

santhosh-tekuri commented Jun 14, 2025

strings.ContainsAny(s, "<>" {}|\^`")

only check of non-ascii; if some one wants strict checking let them implement it;

the changes proposed are adding so many functions and confusing the the readability
I suggest to add single function: ensureASCII(func(v any) error) func(v any) error

and change the initializers as below:

	"uri":                   {"uri", ensureASCII(validateURI)},
	"uri-reference":         {"uri-reference", ensureASCII(validateURIReference)},

@bernhardreiter
Copy link
Author

bernhardreiter commented Jun 16, 2025

strings.ContainsAny(s, "<>" {}|\\^`")

only check of non-ascii

only checking for non-ascii would remove a check for an illegal \ which is in place in validateURIReference(). Do I understand you correctly that this check for the illegal ASCII char \ should be removed?

if some one wants strict checking let them implement it

(We would like better checking, yes.)

the changes proposed are adding so many functions and confusing the the readability

One function is removed, so instead of two larger functions there is one with four code paths. And four selector functions. I can see that the names are too close to each other.

Adding only ensureASCII would lead to code duplication, as you would need validateURIReference again and the check for s, ok := v.(string) at the beginning of all three functions.

@santhosh-tekuri
Copy link
Owner

from the point this library implementation: uri, iri validations are same, similarly urn-reference, ire-reference validations are same.

the intension of this PR is to add additional validation for uri, and uri-reference that all characters must be ascii.

so just adding ensureASCII and wrapping uri and url-reference validations with ensureASCII makes it easier to understand.

it seems you are confused that ensureASCII means only alphabets.

@santhosh-tekuri santhosh-tekuri changed the title improve format uri validation format: ensure ascii in uri, url-reference Jun 16, 2025
@santhosh-tekuri santhosh-tekuri changed the title format: ensure ascii in uri, url-reference format: ensure ascii in uri, uri-reference Jun 16, 2025
@bernhardreiter
Copy link
Author

from the point this library implementation: uri, iri validations are same, similarly urn-reference, ire-reference validations are same.

I know that this has been the case, but as RFC 3986 and 3987 are different and https://json-schema.org/draft/2020-12/json-schema-validation#name-resource-identifiers points to them specifically for uri and iri. So a correct validation must take the differences into account.

URIs can only consist out of ASCII chars and in addition some ASCII chars are not allowed (by the ABNF specification). Go's url.Parse does check for a number of invalid chars (like control characters), but not for the chars I've listed in the PR.

The code before was inconsistent in that only one invalid char was tested (\) and only for uri-reference, while it is also an invalid char for uris.

Considering the most correct, consistent and without much duplication code, brought me to the current revision. :)

Copy link

@tschmidtb51 tschmidtb51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@santhosh-tekuri
Copy link
Owner

seems there is some confusion. I am talking in context of only this PR. This PR is just to add ascii check. if you want other validations do it another PR.

also we are not going into the rabbit hole of implementing entire RFC. if that is the case I suggest to implement as separate library and plugin the validation.

my intension is to make atomic change which just addresses the current PR.

@bernhardreiter
Copy link
Author

@santhosh-tekuri thanks for the further explanation.

I will think about how to split up the changes - even it makes the between two changes less consistent.

Can you also clarify:

also we are not going into the rabbit hole of implementing entire RFC

Disallowing illegal ASCII characters in format uri and uri-refence seems a good check according to the JSON Schema definition. More checks would be possible, especially considering schema, but I agree that it would be a little bit too much to include here. So I'd recommend doing the illegal ASCII chars validation. I assume that is okay for you. If not - you would need to remove the stray test for \.

@santhosh-tekuri
Copy link
Owner

I will think about how to split up the changes - even it makes the between two changes less consistent.

we already have parseURL which is superset of uri and iri. just add parseURI which adds any additional validations for uri and uni-reference

if you are willing I suggest to create PR to https://github.com/json-schema-org/JSON-Schema-Test-Suite to add your extra tests.

@koplas
Copy link

koplas commented Jun 23, 2025

I submitted a pull request with the additional tests: json-schema-org/JSON-Schema-Test-Suite#778.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@santhosh-tekuri santhosh-tekuri force-pushed the boon branch 2 times, most recently from 3a6ebaa to 74428e2 Compare July 9, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

format uri assertion allows invalid https://foobar®.pdf
5 participants