Skip to content

fs: support different path name encodings #3519

Closed
@bnoordhuis

Description

@bnoordhuis

Continuing from #3401, it's clear that the way node.js handles path name encodings is sub-optimal. What is not clear is how to fix it. This issue is for discussing possible solutions.

A quick recap of the current situation:

  • node.js assumes UTF-8 in most - but not all - places.
  • UTF-8 is fine on Windows. Libuv converts UTF-8 to and from UTF-16, which is what the kernel expects.
  • UTF-8 is common but not universal on UNIX systems. Most file systems are character set agnostic, encodings are normally by convention. OS X's HFS+ is the most common exception to the rule.

Considerations:

  • Conversions should be zero-byte safe because most C APIs operate on zero-terminated strings.
  • JS strings are conceptually always UTF-16 but V8 accepts ISO-8859-1, UTF-8 and UTF-16 as input.
  • Conversion (to JS string) from ISO-8859-1 is lossless but conversion from UTF-8 and UTF-16 is not: invalid byte sequences are replaced with U+FFFD.
  • Inversely, conversion to UTF-8 and UTF-16 is lossless but conversion to ISO-8859-1 is not: out-of-range characters wrap around - which can be insecure, see the bullet point about C APIs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues opened for discussions and feedbacks.fsIssues and PRs related to the fs subsystem / file system.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions