Skip to content

Mail::Body#to_s uses wrong String encoding #1413

@koppen

Description

@koppen

Given an email encoded with a ISO-8859-1 charset, I'd expect the strings coming out of mail methods to be either:

  1. Encoded with ISO-8859-1
  2. Converted correctly to UTF-8

However, we see them returned with ASCII-8BIT encodings which doesn't seem right - and which causes problems on subsequent handling of those strings.

How to reproduce

The following script illustrates the issue:

require "mail"

mail = Mail.new('
Content-Type: multipart/alternative; boundary="_000_AM0PR09MB233743E6638C8665A901BF08CA220AM0PR09MB2337eurp_"

--_000_AM0PR09MB233743E6638C8665A901BF08CA220AM0PR09MB2337eurp_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

R=F8dgr=F8d med fl=F8de p=E5!
')
text_part = mail.text_part

charset = text_part.charset
puts "charset: #{charset.inspect}"

body_string = text_part.body.to_s
puts "encoding: #{body_string.encoding.inspect}"

# This triggers a conversion of the string to UTF-8, which then fails.
require "json"
JSON.generate({body: body_string})

Running it outputs

charset: "iso-8859-1"
encoding: #<Encoding:ASCII-8BIT>
Traceback (most recent call last):
	2: from repro2.rb:22:in `<main>'
	1: from ~/.rvm/gems/ruby-2.6.5/gems/json-2.3.0/lib/json/common.rb:224:in `generate'
~/.rvm/gems/ruby-2.6.5/gems/json-2.3.0/lib/json/common.rb:224:in `generate': "\xF8" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)

While the exception comes from inside the JSON gem, I'd risk assessing that the root cause is because body_string.encoding is #<Encoding:ASCII-8BIT> and not #<Encoding:ISO-8859-1>. To verify this we can add a

body_string.force_encoding(charset)

before performing the JSON encoding, which makes the script run without exceptions.

Versions

  • mail 2.8.0.edge
  • ruby 2.6.5 and 2.7

Possibly related issues

I did look through the existing issues, and while I wasn't able to find any that matches the issue exactly, there are quite a few that seem related. Many are older, though, and/or missing reproduction steps:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions