-
-
Notifications
You must be signed in to change notification settings - Fork 158
[WIP] Issue with emails attached to an email #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
After some debugging, I came to realise this might be an unintended side-effect of the mechanism by which Fetch deals with multipart emails (plaintext/HTML), specifically Message::processStructure(). That method could conceivably be extended or modified to allow passing of some configuration parameter to prevent this behaviour, but who knows if that's even a desired goal of the Fetch project? Should I think about writing a patch for this and contributing it, or should I seek another solution? Ideally our users would not be attaching emails to other emails in the first place, but part of the point of this project is to reduce the need for behavioural change on the part of our users. |
I'd rather this get fixed in such a way that it does not require a configuration setting, but simple works for the use case you're providing (i.e., that .eml files get handled accordingly). I'd be happy to see any patch you come up with. |
To provide some more information: compare:
to:
I believe this is because Outlook is not providing filenames for them, since they did not originate from the filesystem. Detecting the "RFC822" value for Update: It appears that -- even saving the .eml file to disk before attaching it to another email -- either Outlook is removing the filenames when it sends them, or PHP's Another Update: In learning more about SMTP/IMAP and the RFC822/RFC2822 email formats, I've learned that the Content-Disposition header is where the filename is defined. Sending an email as an attachment to another email results in no filename being defined in the Content-Disposition header for that attachment, apparently regardless of which email client I send the email from. I'm beginning to think that this is an intended feature. The confusing part to me, however, is that other email clients successfully receive the attached email files (with the correct filenames) despite the filename and file extension never being defined within the email contents. I'm really not sure where it could get that information from if it wasn't included in the email data. I am continuing my investigation and will update this comment as I learn more. |
After much reading and experimentation (and a helpful prod from Gmail), I opted to take the "make up a fake filename" route to solve this problem (and I'm comforted in the knowledge that Google chose the same solution). Basically the problem occurs -- whether by accident or by design is unclear -- like so: when "forwarding as attachment" (attaching an email to another email as a file), regardless of which email client the message originated from, a Simply put, my patch detects these nameless attached emails (they still have a MIME type of |
…owing them to be seen as attachments rather than merged into the parent email.
…rrected spelling error
Thanks for putting so much work into this! I'm going to take a look at it next week (personal stuff has kept me busy the last week) and will either merge it or provide feedback (first impressions though make me pretty positive it'll just get merged in). Thanks so much for the pull request! |
I'm glad to help. Down the line I may figure out a way to process the .eml attachment and grab the subject line to use as the file name. Outlook seems to do that, so it might be a good practise to follow. It's currently trailing behind on my to-do list, but I'll either make another pull request or append it to this one (depending on the status of this one at that time). |
…xtraction of attached .eml file's subject line for use as the filename in addAttachment, with 'email.eml' as fallback.
I got frustrated with what I was supposed to be working on and gave myself a nice relaxing distraction by grabbing the content of any .eml attachments and extracting the Subject line for use as a filename. |
Discovered an issue with my method of grabbing the subject line verbatim -- if it contains non-ASCII characters, the originating client will encode the subject line (such as "=?utf-8?B?SGVsbG8gV29ybGQ=?" -- described here), resulting in the filename being a Base64 or Quoted-Printable encoded string, rather than the original subject line. I have devised a solution to this issue, but have not completed testing for it yet. I will commit it to this branch when I believe it's ready for inclusion. An additional layer on the issue is saving the file (and accessing it via a URL in a browser) if the decoded filename contains non-URL-safe characters such as ?&%= and non-filesystem-safe characters; I am currently working on implementing a patch for that. |
… using it as the filename.
… to succeed on malformed .eml files, but fail on correctly-formatted .eml files.
Changes Unknown when pulling 45741af on AdrianTP:issue43 into * on tedivm:master*. |
…not exist' and 'It seems like can also be type ; however, does only seem to accept , maybe add an additional type check?').
@tedivm I am having trouble figuring out why this isn't passing build. Also, suddenly clicking the "Details" link takes me to Scrutinizer, whereas before it took me to the actual build log, where I could see what errors were being generated. Now I have no clue what's failing build or why or how to fix it. Edit: I figured out how to get to the latest build on Travis, but it still doesn't tell me anything useful about why I'm not passing build. It stops here:
|
Ignore the scrutinizer thing, that's purely informational. Travis-CI is where the issue is occurring- https://travis-ci.org/tedivm/Fetch/jobs/25022348 The unit testing is working fine (although it seriously drops code coverage, meaning you'll probably need to flesh out the testing for what you've added)- the test that fails is the coding standards one. Here's a quick guide to running php-cs-fixer on the project. Two commands, really simple and the tests will start passing again. https://github.com/tedivm/Fetch/blob/master/CONTRIBUTING.md#code-styling |
The steps in the Readme aren't working.
doesn't give any errors, but
fails with |
…e function declaration.
Try
|
I get the following error (same as last time):
Running |
I installed PHP-CS-Fixer globally by following the Edit: it looks like Travis didn't notice my push. |
… signed email. Changed regex to be more specific and resilient.
… again. Replaced all tabs with spaces.
Ah, the problem is that there has been a lot of updates in the mainline of code, and your branch is rather older. It's also making the pull request unable to work on an automatic level, meaning we're getting into cherry picking territory here. |
Should I do a local merge from master and recommit it to this branch? |
Give Message class constants to access the imap flags
OK @tedivm, I tried to update my fork to your master and merge it into my issue43 branch and I'm pretty sure I completely screwed it up. I've never done that sort of thing before. @decsrv I apologise for further delaying this. If you know how to catch this up to master and merge it in, please feel free, as I am apparently incapable. :( |
…owing them to be seen as attachments rather than merged into the parent email.
…rrected spelling error
…xtraction of attached .eml file's subject line for use as the filename in addAttachment, with 'email.eml' as fallback.
… using it as the filename.
… to succeed on malformed .eml files, but fail on correctly-formatted .eml files.
…not exist' and 'It seems like can also be type ; however, does only seem to accept , maybe add an additional type check?').
…e function declaration.
… signed email. Changed regex to be more specific and resilient.
… again. Replaced all tabs with spaces.
…sing on message/rfc822 attachments in order to avoid mangling the file.
…thing with a disposition of 'attachment' will be added to the array rather than inlined.
@tedivm and @decsrv -- Scratch my previous comment. I reset my head to before the rebase and started over, and this time I got it to work. I've never done that before, so it was super awkward. In any case, it should be good to go, I think. The Travis CI build is still in progress as of this posting, so I cannot say for sure if everything's alright. |
…e/rebase apparently enveloped.
Changes Unknown when pulling b6fa0f4 on AdrianTP:issue43 into * on tedious:master*. |
Changes Unknown when pulling 0c3f751 on AdrianTP:issue43 into * on tedious:master*. |
Aww yeah 😎 build passed. |
@tedivm How does this pull request look now? Afaik I have it up-to-date with master, at least as of 22 days ago. Is there anything you'd like me to change/update? |
So this PR is now 50 days behind master. Do I need to do another rebase? |
Yeah, it needed to be rebased. I've added a bunch of other stuff in (to prevent all of those PRs from needing to be rebased), but that means the queue is pretty clear of things that can be immediately merged. If you can fix this up one more time we should be able to pull in it quickly. One thing I want to add though is that it would be amazingly helpful if you could run through the test suite and see if you can get the line code coverage back up above the 90% mark for these changes. Thanks! |
An email with an embedded email can have the following structure: 1: "text/plain" 2: "message/rfc822" 2: "multipart/mixed" 2.1: "text/plain" 2.2: "application/octet-stream" 2.3, "application/octet-stream" Before this fix this structure was parsed as 1: "text/plain" 2: "message/rfc822" 2.1: "multipart/mixed" 2.1.1: "text/plain" 2.1.2: "application/octet-stream" 2.1.3, "application/octet-stream" Hence, downloading attachments was not possible due to wrong part identifiers resolves tedious#188, tedious#43
We're using Fetch to grab emails and process attachments. I sent an email from Outlook for Mac with two other emails (.eml files) attached to it, one of which had a .xlsx file attached to it. Fetch only returns the .xlsx file as an attachment of the main email, completely ignoring (or parsing through) the attached .eml files. Is this intentional behaviour?
Structure of the email I sent:
Structure of the data returned by ::getAttachments():
Am I just missing some configuration option or flag or something which will prevent this behaviour? I'm not able to ascertain just where in the code this manipulation/discarding of the attachments is occurring. Our intended use requires receiving unmodified attachments for processing and storage -- even .eml files (the contents of these .eml files, not their attachments, are what we're after, in most cases).