-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Body character coding problem #54
Comments
Can you forward an example message to me? [email protected] |
hi, Thanks for the help, -----Original Message----- Can you forward an example message to me? [email protected] Reply to this email directly or view it on GitHub: |
Hi! I modified and complied your code in ImapClient.cs class in the GetMessages() procedure with the following code:
I have tested it with Unicode mails and they now displaying correctly. The code is not nice but working. It might give you an idea how to workaround this problem. Best regards, -----Original Message----- Can you forward an example message to me? [email protected] Reply to this email directly or view it on GitHub: |
Related to this problem I found another one in HeaderObject.cs class in SetBody(string value) procedure: var data = Convert.FromBase64String(value); Previous line should look like something like this: value = System.Text.Encoding.GetEncoding(??BODY_CHARSET??).GetString(Convert.FromBase64String(value)); So the character encoding of the body (charset) should be stored in the object model somehow. This way you could refer to it any time when you need to use a decoder function. |
I can see two cases where your code will fail :
And i think the charset header is not even mandatory. |
well I know it's not perfect but working. If you check the code then you can see it will use your OS default codepage if it cant find the charset in body: string temp_charset = Encoding.Default.BodyName. |
Can you please provide me with a codeblock what you have changed? |
Hi, could you try with this code ? [Edited : the part where I decoded quoted-printable and base64 was just stupid........] Added in utilities :
internal static int LastIndexOfArray(Array fullArray, Array innerArray, int start = 0, int end = -1)
|
Please send me the whole GetMessages() function because some of the declaration is missing from the above code. |
what is cancellationPending variable? |
Sorry, you should just remove the lines "if (cancellationPending) throw new GetMessagesCancelledException();" i forgot to take them off, they are not part of the library. |
I think the regex needs to be adjusted because i found a few emails that would make it fail. |
It's working with UTF8 mails but not with Latin1 charachters.
Index was outside the bounds of the array. |
Have you tried with the method in the rtf file i sent ? So far i noticed to bugs :
|
Yes with the rtf file it's working. Nice. |
In some cases it still appears that it can't find the right decoder. Just received a mail that still contains '?' (question marks) |
Cool ! Just a thought : |
It's possible, as an example my code won't work if there is no charset specified at all (it happens !) or if the headers of a part are like that:
The regex has to be updated but i think the general idea is there.
|
yes, good thought. |
The code in HeaderObject.cs should work just fine. The value is converted to a byte and then it uses a StreamReader to decode it -- using |
Much simpler, better and faster version of the previous custom GetMessages : |
Thanks, I'm going to test it as soon as I can. |
So, I have tested your last GetMessages() function with different mails, here is the result: some of the mails with Latin charset are displaying well but in 1 case there is still '?' character instead of unicode chars. This is a text/html mail. There is another problem: line breaks are missing. In original version there were \n\r line breaks but now they gone. |
I have forwarded a mail that is displaying incorrectly to: [email protected] |
Here is an example where your code fail: Mail snippet: --b1_03e47a899ab3248e32619e3ddc24148c Regards, |
Conclusion: returning to roots. So I have tested your code block with 5 different mails and got the following result: 4 of 5 mails displaying correctly while 1 mail is still incorrect. I changed back my GetMessages() function to my previous version explained here: #54 (comment) and this code displaying 5/5 mails correctly. So I'm returning to this version. |
Could you send me the full problematic email and tell me where exactly it fails parsing ? (use github messages) Are the line breaks missing in every message ? |
I have forwarded the mail. |
…ent is used, and default the charset to `ISO-8859-1`.
Hey guys, I think this is simply an issue of not using the right default character set. I've updated the code to use ISO-8859-1, Latin 1, as the default, and then updated the |
Hi Andy, so this fix works not just for ISO-8859-1 rather every kind of messages? |
You might want to use this code, instead of hard coding ISO-8859-1? --modifiied--
|
There is 1 case when your code fails: I've got a mail containing the following line: charset=binary Never seen this before but it seems like it's possible. public static Encoding ParseCharsetToEncoding(string characterSet)
when you call return Encoding.GetEncoding(characterSet); characterSet variable value holds this "binary" and it fails here. |
possible workaround:
or
|
…ader value when reading in attachments (closes #63)
Stack overflow in HeaderObject.cs, here:
By the way this improvements looking promising :) |
Oy. Can't believe I missed that. I should really learn to run the unit tests before checking in. :] |
Great work! All the messages are displaying correctly. Decoding can't be a problem anymore :). |
There is still 1 case when the message is displaying incorrectly:
This time the message is being decoded with ISO-8859-2 while it was encoded with UTF-8 so unicode characters will be displayed wrong. |
I don't understand how using the system's default encoding can be a solution. This probably works fine on simple cases where emails are exchanged within people using the same environnment but i'm sure soon enough someone will open an issue saying "hey I received an email from this unknown country with that particular encoding and it's not displaying correctly" |
It's a solution because as far as I can tell, SMTP headers are supposed to be encoded with Latin1. Anything that isn't Latin1 should be encoded in place with the |
@meehi I've set |
Fair enough but I was more thinking about the body, would it work in every case ? |
Hmmm... I think there's still an issue. Shoot. To set the body, it switches to the encoding specified in the headers, but by that point, it's already been decoded. I've got an idea. I'll work on it tonight. |
Ok, I will wait for that idea and then test it. |
Looks like you guys seriously need to use MimeKit to solve this problem. |
Hi!
I'm working on a program in VS2010 (C#) and using your component to receive unseen messages. It's working nice but today I received a mail that is not shown correctly and containing '?' characters (Unicode characters).
Here is the code I use:
System.Lazy<AE.Net.Mail.MailMessage>[] myMessages = _client2.SearchMessages(AE.Net.Mail.SearchCondition.Unseen());
foreach (System.Lazy<AE.Net.Mail.MailMessage> message in myMessages)
{
AE.Net.Mail.MailMessage msg = message.Value;
}
I have added a watch on "msg" variable and can see the following:
msg.AlternativeViews[0].Content -> A h�ten virtu�lis
msg.AlternativeViews[0].ContentEncoding -> 8bit
msg.BodyHtml -> the same as above, but with html tags
msg.AlternativeViews[0].Content should be look like this -> A héten virtuális...
As you can see the message body contains accute (Unicode) characters (they are Hungarian characters, and might be in ISO-8859-1, or ISO-8859-2).
The text was updated successfully, but these errors were encountered: