-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Charset in imap doesn't work correctly #48
Comments
Ok,i analyzed the problem and studied a solution
This should true for the headers, but the email content is encoded using the encoding specified in the Content-Type header of the email, That's not all, because the content could be of this type: Content-Type: text/plain; charset="ISO-8859-1" The real problem if we get the string of the content encoded in ISO-8859-1 using the UTF8 decoder we loose information, because if the body contains culture specific characters (like òèàùàè) it interprets them as '?'. Store the RawBody as a string is not bad, as we know the c# strings have 16bit per char (they are unicode), but just before the mail.Load(body.ToString(), headersonly); in the GetMessages() method At this point there is another problem, because the implicit operator that cast a MailMessage do not care about the encoding at all. the Attachment.GetData() method is wrong, and the attachment.ContentType is wrong too, because they do not care of the original encoding of the various parts.. I found for my purpose a working solution (a workaroud), it was simple because utf8 has the character of my language. I hope that these considerations may help someone find a smarter solution, because unfortunately I do not have time to do it, now. |
So you say you found a way to work with accentuation ? |
reporcello you are right this line is wrong: it should look like something like this: charset is a string variable and should be take its value from the body ContentType. |
These seems to be related to closed issue 49. Do you still have this problem with latest version? |
I still have this problem with the latest version. Issue #49 does not fix it. In my previous comment I have added a sample code logic how it should work properly. You might want to check it out. |
And I think I have duplicated the problem here: #54 |
I did some change in my local version that solved the problem for west european languages, because utf-8 is compatible with that. |
reporcello: |
Maybe we could start by reading the bytes as ASCII, then when we encounter a "=?something?" or a "charset=" (or any other header specifying encoding) we switch to the specified encoding and read the bytes. |
This is a working solution: #54 (comment) I have tested on many Latin1 and UTF8 character encoded mails and it has decoded all of them without problem. It needs further testing and some adjustment. |
The only way to truly solve issues like this is to write a parser that doesn't require the message data to be converted into a unicode string first. In other words, the MIME parser needs to parse byte arrays. See MimeKit for an example of a MIME parser that does this. |
i think something should be done in the internal void SetBody(string value) method..
because in the value that is assigned to Body has wrong characters:
latin characters like 'à' and 'ò' are converted to '?'
The text was updated successfully, but these errors were encountered: