Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotoNaskhArabic missing diacritic #258

Open
davelab6 opened this issue Feb 4, 2025 · 2 comments
Open

NotoNaskhArabic missing diacritic #258

davelab6 opened this issue Feb 4, 2025 · 2 comments

Comments

@davelab6
Copy link
Member

davelab6 commented Feb 4, 2025

Internally b/386702135 was reported, so I'm upcycling this here.

Font

NotoNaskhArabic-Regular.ttf

Where the font came from, and when

AOSP, and GF API

Issue

@mihnita investigated this and said:

TLDR: the problem is the font, missing a ligature info from 0627 0644 0644 0647 to FDF2

The diacritics are 0651 0670 (ARABIC SHADDA, ARABIC LETTER SUPERSCRIPT ALEF)

The input does not matter, only the rendering.

You can see the problem by visiting https://en.wikipedia.org/wiki/Allah on an Android device, and on a desktop.
Depending on the fonts on your desktop the first paragraphs there will render with or without the diacritics.
But more often than not (tried with various browsers on various OSes) it is rendered WITH diacritics.

On Android that page is rendered without diacritics.

The font contains the proper glyph in U+FDF2 (ARABIC LIGATURE ALLAH ISOLATED FORM) and that renders properly on Android.
See the first paragraph in the "Unicode" section.

I attached a test.html file rendering trying to render 0627 0644 0644 0647 and FDF2 with NotoNaskhArabic-Regular.ttf.

Download the attached test.html, download and unzip the font from https://fonts.google.com/noto/specimen/Noto+Naskh+Arabic, and copy the NotoNaskhArabic-Regular.ttf file from the zip in the same folder with test.html.

Image

I patched the font using the Python library fonttools

$ ttx -s NotoNaskhArabic-Regular.ttf
Dumping "NotoNaskhArabic-Regular.ttf" to "NotoNaskhArabic-Regular.ttx"...
Dumping 'GlyphOrder' table...
Dumping 'head' table...
Dumping 'hhea' table...
Dumping 'maxp' table...
Dumping 'OS/2' table...
Dumping 'hmtx' table...
Dumping 'cmap' table...
Dumping 'prep' table...
Dumping 'loca' table...
Dumping 'glyf' table...
Dumping 'name' table...
Dumping 'post' table...
Dumping 'gasp' table...
Dumping 'GDEF' table...
Dumping 'GPOS' table...
Dumping 'GSUB' table...
Dumping 'STAT' table...

I edited the NotoNaskhArabic-Regular.G_S_U_B_.ttx file and found (at line 2610) the following entry:

      <Lookup index="39">
        <LookupType value="4"/>
        <LookupFlag value="16"/><!-- useMarkFilteringSet -->
        <!-- SubTableCount=1 -->
        <LigatureSubst index="0">
          <LigatureSet glyph="uni0627">
            <Ligature components="uni0644.init,uni0644.medi,uni0651,uni0670,uni0647.fina" glyph="uniFDF2"/>
            <Ligature components="uni0644.init,uni0644.medi,uni0651,uni0670,uni06C1.fina" glyph="uniFDF2"/>
          </LigatureSet>
        </LigatureSubst>
        <MarkFilteringSet value="3"/>
      </Lookup>

and I added entries for the sequences without diacritics:

      <Lookup index="39">
        <LookupType value="4"/>
        <LookupFlag value="16"/><!-- useMarkFilteringSet -->
        <!-- SubTableCount=1 -->
        <LigatureSubst index="0">
          <LigatureSet glyph="uni0627">
            <Ligature components="uni0644.init,uni0644.medi,uni0651,uni0670,uni0647.fina" glyph="uniFDF2"/>
            <Ligature components="uni0644.init,uni0644.medi,uni0651,uni0670,uni06C1.fina" glyph="uniFDF2"/>
            <Ligature components="uni0644.init,uni0644.medi,uni0647.fina" glyph="uniFDF2"/> <!-- added -->
            <Ligature components="uni0644.init,uni0644.medi,uni06C1.fina" glyph="uniFDF2"/> <!-- added -->
          </LigatureSet>
        </LigatureSubst>
        <MarkFilteringSet value="3"/>
      </Lookup>

Rebuilt the font:

$ ttx NotoNaskhArabic-Regular.ttx
Compiling "NotoNaskhArabic-Regular.ttx" to "NotoNaskhArabic-Regular#1.ttf"...
Parsing 'GlyphOrder' table...
Parsing 'head' table...
Parsing 'hhea' table...
Parsing 'maxp' table...
Parsing 'OS/2' table...
Parsing 'hmtx' table...
Parsing 'cmap' table...
Parsing 'prep' table...
Parsing 'loca' table...
Parsing 'glyf' table...
Parsing 'name' table...
Parsing 'post' table...
Parsing 'gasp' table...
Parsing 'GDEF' table...
Parsing 'GPOS' table...
Parsing 'GSUB' table...
Parsing 'STAT' table...

Edited test.html to use the new NotoNaskhArabic-Regular#1.ttf instead of NotoNaskhArabic-Regular.ttf.

And the result is

Image


WARNING 1: No opinion if Allah must always be written with diacritics or not.
Looks like it should be, but I don't speak / read / write Arabic.
So I defer to native speakers.
I only tackled the technical part.

WARNING 2: I don't claim that this is the right fix.
I only know enough about fonts to be dangerous :-)
It might be that the proper fix requires a new <Lookup> element. Or a new <LigatureSubst> or a new <LigatureSet>.
But I am pretty sure that cause of the bug is the font, and it is a missing ligature mapping.
After all, the Unicode Decomposition Mapping (dm) for U+FDF2 is <isolated> 0627 0644 0644 0647

WARNING 3: It might be good to check that other very useful Arabic ligatures also have GSUB entries.
Probably the Unicode ligatures that already exist in the NotoNaskhArabic font:

FDF0;ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM;Lo;0;AL;<isolated> 0635 0644 06D2;;;;N;;;;;
FDF1;ARABIC LIGATURE QALA USED AS KORANIC STOP SIGN ISOLATED FORM;Lo;0;AL;<isolated> 0642 0644 06D2;;;;N;;;;;
FDF2;ARABIC LIGATURE ALLAH ISOLATED FORM;Lo;0;AL;<isolated> 0627 0644 0644 0647;;;;N;;;;;
FDF3;ARABIC LIGATURE AKBAR ISOLATED FORM;Lo;0;AL;<isolated> 0627 0643 0628 0631;;;;N;;;;;
FDF4;ARABIC LIGATURE MOHAMMAD ISOLATED FORM;Lo;0;AL;<isolated> 0645 062D 0645 062F;;;;N;;;;;
FDF5;ARABIC LIGATURE SALAM ISOLATED FORM;Lo;0;AL;<isolated> 0635 0644 0639 0645;;;;N;;;;;
FDF6;ARABIC LIGATURE RASOUL ISOLATED FORM;Lo;0;AL;<isolated> 0631 0633 0648 0644;;;;N;;;;;
FDF7;ARABIC LIGATURE ALAYHE ISOLATED FORM;Lo;0;AL;<isolated> 0639 0644 064A 0647;;;;N;;;;;
FDF8;ARABIC LIGATURE WASALLAM ISOLATED FORM;Lo;0;AL;<isolated> 0648 0633 0644 0645;;;;N;;;;;
FDF9;ARABIC LIGATURE SALLA ISOLATED FORM;Lo;0;AL;<isolated> 0635 0644 0649;;;;N;;;;;
FDFA;ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM;Lo;0;AL;<isolated> 0635 0644 0649 0020 0627 0644 0644 0647 0020 0639 0644 064A 0647 0020 0648 0633 0644 0645;;;;N;ARABIC LETTER SALLALLAHOU ALAYHE WASALLAM;;;;
FDFB;ARABIC LIGATURE JALLAJALALOUHOU;Lo;0;AL;<isolated> 062C 0644 0020 062C 0644 0627 0644 0647;;;;N;ARABIC LETTER JALLAJALALOUHOU;;;;
FDFC;RIAL SIGN;Sc;0;AL;<isolated> 0631 06CC 0627 0644;;;;N;;;;;
FDFD;ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM;So;0;ON;;;;;N;;;;;
FDFE;ARABIC LIGATURE SUBHAANAHU WA TAAALAA;So;0;ON;;;;;N;;;;;
FDFF;ARABIC LIGATURE AZZA WA JALL;So;0;ON;;;;;N;;;;;
@khaledhosny
Copy link
Contributor

This is intentional, see #227, #192, and #41.

@khaledhosny
Copy link
Contributor

As for the other ligatures mentioned, these are all legacy ligatures and the Unicode decomposition is a compatibility decomposition, which is a weak form of equivalence in Unicode and should not be generally taken as true equivalence (i.e. the decomposed sequence should not be mapped to the composed form automatically).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants