Skip to content

Incompatible character encodings with ruby 2.1.2 #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
soundcheck2007 opened this issue Aug 20, 2014 · 2 comments · May be fixed by #19
Open

Incompatible character encodings with ruby 2.1.2 #9

soundcheck2007 opened this issue Aug 20, 2014 · 2 comments · May be fixed by #19

Comments

@soundcheck2007
Copy link

Sorry to bother again but after some testing I noticed another issue.

I have my email file in variable 'msg'. Performing the following command shows the expected output (same as Ruby 1.8.7)
msg.to_mime
=> #< Mime content_type="multipart/alternative" >

However, when I call to_s, I receive an error with Ruby 2.1.2
msg.to_mime.to_s
Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT
from /Users/user/.rbenv/versions/2.1.2/gemsets/workers/gems/ruby-msg 1.5.2/lib/mapi/mime.rb:109:in join' from /Users/user/.rbenv/versions/2.1.2/gemsets/workers/gems/ruby-msg 1.5.2/lib/mapi/mime.rb:109:into_s'

When researching the issue, I replaced the following in lib/mapi/mime.rb#108:
part.to_s(opts)
with
part.to_s(opts).encode("UTF-8", :invalid=>:replace, :undef => :replace, :replace => "")

After I made that change to add the .encode in mime.rb, I was able to get my emails to convert correctly.

I also see that the 'parts' of my email are different encodings which I believe is what the error is referring to.
irb(main):003:0> msg.to_mime.parts.each do |part|
irb(main):004:1* puts part.to_s.encoding
irb(main):005:1> end; nil
UTF-8
ASCII-8BIT

Thus, by removing the above change to mime.rb and adding the .encode method in lib/mapi/convert/note_mime.rb#159 on the "props.body_html" also allows the email to convert correctly (that is the 'part' that is being encoded as ASCII-8BIT).

Not sure of the exact way to fix this permanently. If there is any other information I can provide I will be glad to do so.

Thanks for your help!

@aquasync
Copy link
Owner

aquasync commented Sep 5, 2014

Yeah a related (or the same?) problem has been mentioned here - #5. As mentioned there, I think the mime parts should be treated as binary data (ie encoded with ASCII-8BIT), not encoded strings, as they describe their own text encoding through Content-type. Indeed a single message could have multiple parts with different encodings. I think the fix is to avoid the UTF-8 parts being introduced, which I think is by way of strings in the source code being implicitly UTF-8. I image adding the magic "# encoding: ASCII-8BIT" constants to the offending source files would fix the issue.

@acolchagoff
Copy link

I'm having this problem. Any recommended work around?

garethrees added a commit to mysociety/ruby-msg that referenced this issue Sep 22, 2020
In fae72e5 we introduced a fallback for incompatible character
encodings, but that only covered the case where Mapi::Mime#is_multipart?
returns true.

This is a slight variation for the other branch of the conditional.

The error is being raised where we have a `@body` in a different
encoding to the default string encoding. In this case, just always force
the string to be the same encoding as `@body` so we don’t get an error.

This passes the spec for the particular file we’re seeing problems with,
but I’m not at all sure this is the right thing to do. Maybe we should
always treat anything in mapi as ASCII, and then force to UTF-8 for
display? [1]

Fixes mysociety/alaveteli#5783

[1] aquasync#9 (comment)
@faridco faridco linked a pull request Oct 7, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants