Skip to content

[WIP] Issue with emails attached to an email #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

AdrianTP
Copy link

@AdrianTP AdrianTP commented Apr 2, 2014

We're using Fetch to grab emails and process attachments. I sent an email from Outlook for Mac with two other emails (.eml files) attached to it, one of which had a .xlsx file attached to it. Fetch only returns the .xlsx file as an attachment of the main email, completely ignoring (or parsing through) the attached .eml files. Is this intentional behaviour?

Structure of the email I sent:

main email
    .eml file
    .eml file
        .xlsx file

Structure of the data returned by ::getAttachments():

Array
(
    [0] => Fetch\Attachment Object
        (
            [structure:protected] => stdClass Object
                (
                    [type] => 3
                    [encoding] => 3
                    [ifsubtype] => 1
                    [subtype] => VND.OPENXMLFORMATS-OFFICEDOCUMENT.SPREADSHEETML.SHEET
                    [ifdescription] => 0
                    [ifid] => 0
                    [bytes] => 24576
                    [ifdisposition] => 1
                    [disposition] => attachment
                    [ifdparameters] => 1
                    [dparameters] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [attribute] => filename
                                    [value] => document.xlsx
                                )

                        )

                    [ifparameters] => 1
                    [parameters] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [attribute] => name
                                    [value] => document.xlsx
                                )

                        )

                )

            [messageId:protected] => 830
            [imapStream:protected] => Resource id #43
            [partId:protected] => 3.1.2
            [filename:protected] => document.xlsx
            [size:protected] => 24576
            [data:protected] => 
            [mimeType] => application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
            [encoding] => 3
        )

)

Am I just missing some configuration option or flag or something which will prevent this behaviour? I'm not able to ascertain just where in the code this manipulation/discarding of the attachments is occurring. Our intended use requires receiving unmodified attachments for processing and storage -- even .eml files (the contents of these .eml files, not their attachments, are what we're after, in most cases).

@AdrianTP
Copy link
Author

After some debugging, I came to realise this might be an unintended side-effect of the mechanism by which Fetch deals with multipart emails (plaintext/HTML), specifically Message::processStructure(). That method could conceivably be extended or modified to allow passing of some configuration parameter to prevent this behaviour, but who knows if that's even a desired goal of the Fetch project? Should I think about writing a patch for this and contributing it, or should I seek another solution? Ideally our users would not be attaching emails to other emails in the first place, but part of the point of this project is to reduce the need for behavioural change on the part of our users.

@tedivm
Copy link
Member

tedivm commented Mar 27, 2014

I'd rather this get fixed in such a way that it does not require a configuration setting, but simple works for the use case you're providing (i.e., that .eml files get handled accordingly). I'd be happy to see any patch you come up with.

@AdrianTP
Copy link
Author

AdrianTP commented Apr 1, 2014

To provide some more information:
It seems the reason the .eml files are being ignored as attachments and instead get sucked into the "merge the body parts" recursion is because they are not given a "filename" (in $structure->dparameters) or "name" (in $structure->parameters)

compare:

{
        "type": 2,
        "encoding": 0,
        "ifsubtype": 1,
        "subtype": "RFC822",
        "ifdescription": 0,
        "ifid": 1,
        "id": "<[email protected]>",
        "lines": 147,
        "bytes": 9776,
        "ifdisposition": 1,
        "disposition": "attachment",
        "ifdparameters": 1,
        "dparameters": [{
            "attribute": "creation-date",
            "value": "Thu, 27 Mar 2014 16:45:22 GMT"
        }, {
            "attribute": "modification-date",
            "value": "Thu, 27 Mar 2014 16:45:22 GMT"
        }],
        "ifparameters": 0,
        "parameters": {},
        "parts": [{"..."}]
}

to:

{
        "type": 5,
        "encoding": 3,
        "ifsubtype": 1,
        "subtype": "JPEG",
        "ifdescription": 1,
        "description": "Turkish_Van_Cat.jpg",
        "ifid": 1,
        "id": "<[email protected]>",
        "bytes": 2351204,
        "ifdisposition": 1,
        "disposition": "attachment",
        "ifdparameters": 1,
        "dparameters": [{
            "attribute": "filename",
            "value": "Turkish_Van_Cat.jpg"
        }, {
            "attribute": "size",
            "value": "1718186"
        }, {
            "attribute": "creation-date",
            "value": "Thu, 27 Mar 2014 16:45:23 GMT"
        }, {
            "attribute": "modification-date",
            "value": "Thu, 27 Mar 2014 16:45:23 GMT"
        }],
        "ifparameters": 1,
        "parameters": [{
            "attribute": "name",
            "value": "Turkish_Van_Cat.jpg"
        }]
}

I believe this is because Outlook is not providing filenames for them, since they did not originate from the filesystem. Detecting the "RFC822" value for $structure->subtype and generating an artificial "filename" for $structure->dparameters works, but seems like the wrong way to do it (not in the least because every attached .eml file would then inherit identical filenames). I'm looking into possible solutions, and I'll run an experiment to see if I can get Outlook to define filenames for these "RFC822" attachments.

Update:

It appears that -- even saving the .eml file to disk before attaching it to another email -- either Outlook is removing the filenames when it sends them, or PHP's imap_fetchstructure() is ignoring or removing the filenames of anything with a $structure->subtype of "RFC822." The result received from imap_fetchstructure in Message::getStructure() does not contain any filename information for any RFC822-formatted files.

Another Update:

In learning more about SMTP/IMAP and the RFC822/RFC2822 email formats, I've learned that the Content-Disposition header is where the filename is defined. Sending an email as an attachment to another email results in no filename being defined in the Content-Disposition header for that attachment, apparently regardless of which email client I send the email from. I'm beginning to think that this is an intended feature. The confusing part to me, however, is that other email clients successfully receive the attached email files (with the correct filenames) despite the filename and file extension never being defined within the email contents. I'm really not sure where it could get that information from if it wasn't included in the email data. I am continuing my investigation and will update this comment as I learn more.

@AdrianTP
Copy link
Author

AdrianTP commented Apr 2, 2014

After much reading and experimentation (and a helpful prod from Gmail), I opted to take the "make up a fake filename" route to solve this problem (and I'm comforted in the knowledge that Google chose the same solution). Basically the problem occurs -- whether by accident or by design is unclear -- like so: when "forwarding as attachment" (attaching an email to another email as a file), regardless of which email client the message originated from, a filename= string is not included in the Content-Disposition header of the data describing the attached email (which leads to my suspicion that this is intentional, despite it not being explicitly mentioned in RFC822). This results in Fetch treating the body of the attached email as yet another part of the multipart structure and thus it chooses to merge the text and html content into the main email, bringing any further-nested attachments up to the top level -- essentially flattening the tree structure. While this behaviour makes sense to some extent when dealing with emails from antiquated clients which forward emails as attachments by default (putting the burden of inlining the content on the recieving client rather than on the sending client), it is detrimental behaviour in my use-case.

Simply put, my patch detects these nameless attached emails (they still have a MIME type of message/rfc822), generates a filename for them if none exists (Gmail names them noname.eml, but I chose email.eml), pushes them onto the attachments array, and carries on with the usual process.

AdrianTP added 2 commits April 2, 2014 18:28
…owing them to be seen as attachments rather than merged into the parent email.
@tedivm
Copy link
Member

tedivm commented Apr 12, 2014

Thanks for putting so much work into this! I'm going to take a look at it next week (personal stuff has kept me busy the last week) and will either merge it or provide feedback (first impressions though make me pretty positive it'll just get merged in).

Thanks so much for the pull request!

@AdrianTP
Copy link
Author

I'm glad to help. Down the line I may figure out a way to process the .eml attachment and grab the subject line to use as the file name. Outlook seems to do that, so it might be a good practise to follow. It's currently trailing behind on my to-do list, but I'll either make another pull request or append it to this one (depending on the status of this one at that time).

…xtraction of attached .eml file's subject line for use as the filename in addAttachment, with 'email.eml' as fallback.
@AdrianTP
Copy link
Author

I got frustrated with what I was supposed to be working on and gave myself a nice relaxing distraction by grabbing the content of any .eml attachments and extracting the Subject line for use as a filename.

@AdrianTP
Copy link
Author

Discovered an issue with my method of grabbing the subject line verbatim -- if it contains non-ASCII characters, the originating client will encode the subject line (such as "=?utf-8?B?SGVsbG8gV29ybGQ=?" -- described here), resulting in the filename being a Base64 or Quoted-Printable encoded string, rather than the original subject line. I have devised a solution to this issue, but have not completed testing for it yet. I will commit it to this branch when I believe it's ready for inclusion.

An additional layer on the issue is saving the file (and accessing it via a URL in a browser) if the decoded filename contains non-URL-safe characters such as ?&%= and non-filesystem-safe characters; I am currently working on implementing a patch for that.

@tedivm tedivm changed the title Issue with emails attached to an email [WIP] Issue with emails attached to an email Apr 24, 2014
AdrianTP added 3 commits May 12, 2014 15:00
… to succeed on malformed .eml files, but fail on correctly-formatted .eml files.
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 45741af on AdrianTP:issue43 into * on tedivm:master*.

…not exist' and 'It seems like can also be type ; however, does only seem to accept , maybe add an additional type check?').
@AdrianTP
Copy link
Author

@tedivm I am having trouble figuring out why this isn't passing build. Also, suddenly clicking the "Details" link takes me to Scrutinizer, whereas before it took me to the actual build log, where I could see what errors were being generated. Now I have no clue what's failing build or why or how to fix it.

Edit: I figured out how to get to the latest build on Travis, but it still doesn't tell me anything useful about why I'm not passing build. It stops here:

Testing for Coding Styling Compliance.
All code should follow PSR standards.
   1) src/Fetch/Message.php (�[33mindentation, trailing_spaces, phpdoc_params, return, braces, controls_spaces, elseif�[0m)

�[31;1mThe command "./tests/runTests.sh" exited with 1.�[0m

@tedivm
Copy link
Member

tedivm commented May 12, 2014

Ignore the scrutinizer thing, that's purely informational. Travis-CI is where the issue is occurring-

https://travis-ci.org/tedivm/Fetch/jobs/25022348

The unit testing is working fine (although it seriously drops code coverage, meaning you'll probably need to flesh out the testing for what you've added)- the test that fails is the coding standards one.

Here's a quick guide to running php-cs-fixer on the project. Two commands, really simple and the tests will start passing again. https://github.com/tedivm/Fetch/blob/master/CONTRIBUTING.md#code-styling

@AdrianTP
Copy link
Author

The steps in the Readme aren't working.

composer install --dev

doesn't give any errors, but

vendor/bin/php-cs-fixer fix ./ --level="all" -vv

fails with vendor/bin/php-cs-fixer: No such file or directory. PHP-CS-Fixer does not appear to have been installed automatically. As far as I can tell I have the latest code from master.

@tedivm
Copy link
Member

tedivm commented May 13, 2014

Try

./vendor/bin/php-cs-fixer fix ./ --level="all" -vv

@AdrianTP
Copy link
Author

I get the following error (same as last time):

./vendor/bin/php-cs-fixer: No such file or directory

Running composer install --dev only seems to install dovecottesting. In fact, reading the composer.json shows only "tedivm/dovecottesting": "1.2.1" as a dependency for require-dev.

@AdrianTP
Copy link
Author

I installed PHP-CS-Fixer globally by following the curl instructions at http://cs.sensiolabs.org/, ran php-cs-fixer fix ./ --level="all" -vv in the Fetch directory, and pushed the resulting changes in 881a3ff. Let's see what Travis has to say now...

Edit: it looks like Travis didn't notice my push.

AdrianTP added 2 commits May 13, 2014 17:18
… signed email. Changed regex to be more specific and resilient.
@tedivm
Copy link
Member

tedivm commented May 14, 2014

Ah, the problem is that there has been a lot of updates in the mainline of code, and your branch is rather older. It's also making the pull request unable to work on an automatic level, meaning we're getting into cherry picking territory here.

@AdrianTP
Copy link
Author

Should I do a local merge from master and recommit it to this branch?

Give Message class constants to access the imap flags
@AdrianTP
Copy link
Author

OK @tedivm, I tried to update my fork to your master and merge it into my issue43 branch and I'm pretty sure I completely screwed it up. I've never done that sort of thing before.

@decsrv I apologise for further delaying this. If you know how to catch this up to master and merge it in, please feel free, as I am apparently incapable. :(

AdrianTP added 16 commits July 30, 2014 00:27
…owing them to be seen as attachments rather than merged into the parent email.
…xtraction of attached .eml file's subject line for use as the filename in addAttachment, with 'email.eml' as fallback.
… to succeed on malformed .eml files, but fail on correctly-formatted .eml files.
…not exist' and 'It seems like can also be type ; however, does only seem to accept , maybe add an additional type check?').
… signed email. Changed regex to be more specific and resilient.
…sing on message/rfc822 attachments in order to avoid mangling the file.
…thing with a disposition of 'attachment' will be added to the array rather than inlined.
@AdrianTP
Copy link
Author

@tedivm and @decsrv -- Scratch my previous comment. I reset my head to before the rebase and started over, and this time I got it to work. I've never done that before, so it was super awkward. In any case, it should be good to go, I think. The Travis CI build is still in progress as of this posting, so I cannot say for sure if everything's alright.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling b6fa0f4 on AdrianTP:issue43 into * on tedious:master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 0c3f751 on AdrianTP:issue43 into * on tedious:master*.

@AdrianTP
Copy link
Author

Aww yeah 😎 build passed.

@AdrianTP
Copy link
Author

@tedivm How does this pull request look now? Afaik I have it up-to-date with master, at least as of 22 days ago. Is there anything you'd like me to change/update?

@AdrianTP
Copy link
Author

So this PR is now 50 days behind master. Do I need to do another rebase?

@tedivm
Copy link
Member

tedivm commented Sep 25, 2014

Yeah, it needed to be rebased. I've added a bunch of other stuff in (to prevent all of those PRs from needing to be rebased), but that means the queue is pretty clear of things that can be immediately merged. If you can fix this up one more time we should be able to pull in it quickly.

One thing I want to add though is that it would be amazingly helpful if you could run through the test suite and see if you can get the line code coverage back up above the 90% mark for these changes.

Thanks!

klammbueddel pushed a commit to klammbueddel/Fetch that referenced this pull request Oct 19, 2017
An email with an embedded email can have the following structure:

1: "text/plain"
2: "message/rfc822"
2: "multipart/mixed"
2.1: "text/plain"
2.2: "application/octet-stream"
2.3, "application/octet-stream"

Before this fix this structure was parsed as

1: "text/plain"
2: "message/rfc822"
2.1: "multipart/mixed"
2.1.1: "text/plain"
2.1.2: "application/octet-stream"
2.1.3, "application/octet-stream"

Hence, downloading attachments was not possible due to wrong part
identifiers

resolves tedious#188, tedious#43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants