Skip to content

Node.js / HTTP-Parser not handling UTF-8 encoded HTTP header values  #17390

@Mickael-van-der-Beek

Description

@Mickael-van-der-Beek

Hi,

I've encountered an issue regarding the way HTTP header values are decoded.
The HTTP Parser project might be a better place to post this issue to but I thought I'd post here first.

Currently it would seem that Node.js is decoding HTTP header values as US-ASCII / ASCII-7.

This becomes an issue now that browsers and servers started supporting UTF-8 values as well.

A simple example would be a website that has a URL who redirects to a non-percent-encoded UTF-8 URL. e.g:

const http = require('http');
const net = require('net');

const HOST = '127.0.0.1';
const PORT = 3000;

const server = net.createServer(res => {
  res.end([
    'HTTP/1.1 301 Moved Permanently',
    'Location: /fÖÖbÃÃr',
    '\r\n'
  ].join('\r\n'));
});

server.listen(PORT, HOST, err => {
  const req = http.request({
    hostname: HOST,
    port: PORT,
    path: '/'
  }, res => {
    console.log(`Location: ${res.headers.location}`);   
    console.log(`Location: ${Buffer.from(res.headers.location, 'binary').toString('utf8')}`);   
  });

  req.end();
});

The first log will produce: Location: /fÃ�Ã�bÃ�Ã�r (there are invisible characters next to the Ã's).
The second log will produce: Location: /fÖÖbÃÃr, which is the correct and expected result

In this example, to follow the redirect, you'd need to first instantiate a buffer in 'binary' encoding and then stringify it to it's 'utf8' representation.

The original RFC2616 that defined HTTP seemed to allow for any byte value with a few restrictions on control characters:

TEXT = <any OCTET except CTLs, but including LWS>

cf: https://tools.ietf.org/html/rfc7230#section-3.2.6
cf: https://tools.ietf.org/html/rfc2616#section-2.2

The follow-up update to HTTP, RFC7230, seems to change that and restrict them to US-ASCII / ASCII-7:

Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed). (Section 3.2.6)

cf: https://tools.ietf.org/html/rfc7230#appendix-A.2

I would expect most Node.js HTTP clients to thus fail on the example I provided above.

Since browsers seems to support it and servers started sending it (I've seen examples in the wild), I think we can say that it has become a de facto standard and that it would be nice if either Node.js core or HTTP Parser would support reading HTTP header values as UTF-8 by default.

  • Version: Node.js v6.x and v8.x
  • Platform: OSX Darwin Kernel Version 15.6.0
  • Subsystem: http, http-parser

Metadata

Metadata

Assignees

No one assigned

    Labels

    httpIssues or PRs related to the http subsystem.http_parserIssues and PRs related to the HTTP Parser dependency or the http_parser binding.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions