-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
Description
Hi,
I've encountered an issue regarding the way HTTP header values are decoded.
The HTTP Parser project might be a better place to post this issue to but I thought I'd post here first.
Currently it would seem that Node.js is decoding HTTP header values as US-ASCII / ASCII-7.
This becomes an issue now that browsers and servers started supporting UTF-8 values as well.
A simple example would be a website that has a URL who redirects to a non-percent-encoded UTF-8 URL. e.g:
const http = require('http');
const net = require('net');
const HOST = '127.0.0.1';
const PORT = 3000;
const server = net.createServer(res => {
res.end([
'HTTP/1.1 301 Moved Permanently',
'Location: /fÖÖbÃÃr',
'\r\n'
].join('\r\n'));
});
server.listen(PORT, HOST, err => {
const req = http.request({
hostname: HOST,
port: PORT,
path: '/'
}, res => {
console.log(`Location: ${res.headers.location}`);
console.log(`Location: ${Buffer.from(res.headers.location, 'binary').toString('utf8')}`);
});
req.end();
});
The first log will produce: Location: /f��b��r
(there are invisible characters next to the Ã's).
The second log will produce: Location: /fÖÖbÃÃr
, which is the correct and expected result
In this example, to follow the redirect, you'd need to first instantiate a buffer in 'binary'
encoding and then stringify it to it's 'utf8'
representation.
The original RFC2616 that defined HTTP seemed to allow for any byte value with a few restrictions on control characters:
TEXT = <any OCTET except CTLs, but including LWS>
cf: https://tools.ietf.org/html/rfc7230#section-3.2.6
cf: https://tools.ietf.org/html/rfc2616#section-2.2
The follow-up update to HTTP, RFC7230, seems to change that and restrict them to US-ASCII / ASCII-7:
Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed). (Section 3.2.6)
cf: https://tools.ietf.org/html/rfc7230#appendix-A.2
I would expect most Node.js HTTP clients to thus fail on the example I provided above.
Since browsers seems to support it and servers started sending it (I've seen examples in the wild), I think we can say that it has become a de facto standard and that it would be nice if either Node.js core or HTTP Parser would support reading HTTP header values as UTF-8 by default.
- Version: Node.js v6.x and v8.x
- Platform: OSX Darwin Kernel Version 15.6.0
- Subsystem: http, http-parser