-
Notifications
You must be signed in to change notification settings - Fork 942
Description
Given an email encoded with a ISO-8859-1 charset, I'd expect the strings coming out of mail methods to be either:
- Encoded with ISO-8859-1
- Converted correctly to UTF-8
However, we see them returned with ASCII-8BIT encodings which doesn't seem right - and which causes problems on subsequent handling of those strings.
How to reproduce
The following script illustrates the issue:
require "mail"
mail = Mail.new('
Content-Type: multipart/alternative; boundary="_000_AM0PR09MB233743E6638C8665A901BF08CA220AM0PR09MB2337eurp_"
--_000_AM0PR09MB233743E6638C8665A901BF08CA220AM0PR09MB2337eurp_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
R=F8dgr=F8d med fl=F8de p=E5!
')
text_part = mail.text_part
charset = text_part.charset
puts "charset: #{charset.inspect}"
body_string = text_part.body.to_s
puts "encoding: #{body_string.encoding.inspect}"
# This triggers a conversion of the string to UTF-8, which then fails.
require "json"
JSON.generate({body: body_string})
Running it outputs
charset: "iso-8859-1"
encoding: #<Encoding:ASCII-8BIT>
Traceback (most recent call last):
2: from repro2.rb:22:in `<main>'
1: from ~/.rvm/gems/ruby-2.6.5/gems/json-2.3.0/lib/json/common.rb:224:in `generate'
~/.rvm/gems/ruby-2.6.5/gems/json-2.3.0/lib/json/common.rb:224:in `generate': "\xF8" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
While the exception comes from inside the JSON gem, I'd risk assessing that the root cause is because body_string.encoding
is #<Encoding:ASCII-8BIT>
and not #<Encoding:ISO-8859-1>
. To verify this we can add a
body_string.force_encoding(charset)
before performing the JSON encoding, which makes the script run without exceptions.
Versions
- mail 2.8.0.edge
- ruby 2.6.5 and 2.7
Possibly related issues
I did look through the existing issues, and while I wasn't able to find any that matches the issue exactly, there are quite a few that seem related. Many are older, though, and/or missing reproduction steps: