Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Encoding::CESU_8 since 2.7.0 #2610

Merged
merged 1 commit into from
Oct 31, 2021
Merged

Conversation

ima1zumi
Copy link
Contributor

@ima1zumi ima1zumi commented Oct 7, 2021

Ruby 2.7 から追加された Encoding::CESU_8 の説明を追加します。

irb(main):001:0> RUBY_VERSION
=> "2.6.6"
irb(main):002:0> 'a'.encode(Encoding::CESU_8)
Traceback (most recent call last):
        4: from /Users/mi/.asdf/installs/ruby/2.6.6/bin/irb:23:in `<main>'
        3: from /Users/mi/.asdf/installs/ruby/2.6.6/bin/irb:23:in `load'
        2: from /Users/mi/.asdf/installs/ruby/2.6.6/lib/ruby/gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
        1: from (irb):2
NameError (uninitialized constant Encoding::CESU_8)
irb(main):001:0> RUBY_VERSION
=> "2.7.0"
irb(main):002:0> 'a'.encode(Encoding::CESU_8)
=> "a"

CESU-8はサロゲートペアが必要な U+010000 〜 U+10FFFF の符号位置の文字を6バイトに変換するという文字コードのようです。
Unicodeは使用を推奨していないためその旨を書いています。
ref: https://www.unicode.org/reports/tr26/tr26-4.html

irb(main):007:0> '🍣'.encode(Encoding::CESU_8).bytes
=> [237, 160, 188, 237, 189, 163]
irb(main):008:0> '🍣'.bytes
=> [240, 159, 141, 163]

ref: https://bugs.ruby-lang.org/issues/15931
ref: #2071

@znz znz merged commit 1f12c66 into rurema:master Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants