Skip to content

url.parse seems to change depending on the characters used in the domain name #5832

@sam-github

Description

@sam-github

First, the behaviour:

URL host path
http://*/path /*/path
http://./path . /path
http://=/path /=/path
http://-/path - /path
http://0/path 0 /path
http://,/path /,/path
http://@/path /path
http://;/path ;/path
http://[::1]/path [::1] /path

I would expect that in all of the above, that the path would be /path. From my point of view, random non-URL syntax characters are being pushed into the path, and its pretty surprising.

There are some statements in the test code that make this appear
to be deliberate, but they don't justify the behaviour.

While it is true that * is not a valid domain, according to the host parsing rules quoted, neither is -, or 0 or .. I would expect . to be treated as ., returned in the host string.

I would not expect the url parser to validate that domain names are well formed, though I would expect characters that are defined as part of the URL syntax to of course not be valid.

I can implement my own url parser that allows *, and I will for backwards compat, but I think this is a bit odd. URL is generally very lax in its parsing, it gives you the syntactic bits, and you get to validate whether they are correct for your use-case, this is the first time its failed my expectations.

  • Version: 0.10+
  • Platform: all
  • Subsystem: url

Metadata

Metadata

Assignees

No one assigned

    Labels

    urlIssues and PRs related to the legacy built-in url module.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions