Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing roadmap translations #68

Closed
5 tasks done
tidoust opened this issue Sep 15, 2017 · 14 comments
Closed
5 tasks done

Managing roadmap translations #68

tidoust opened this issue Sep 15, 2017 · 14 comments
Labels

Comments

@tidoust
Copy link
Member

tidoust commented Sep 15, 2017

Some notes based on internal discussions for the Chinese translation of the mobile roadmap:

  • It seems more convenient to keep translations with the English content in the same repository.
  • In the end, we'll want to publish roadmaps on w3.org Web site, using content-negotiation. Resulting files would be named .html.zh. However, that's not a very convenient way to work with HTML files on GitHub, so idea is rather to use a zh subfolder in the mobile folder. We should be able to avoid having to modify relative links by using a <base> element. We'll create a publication script as needed before publication.
  • To track changes over time, we'll use git tags to create snapshots of the roadmap that are ready to be translated. This will not be perfect, because the comparison tool will report changes in data files, CSS/JS/Python files, in other roadmaps, and in the translated files themselves, that will need to be filtered out, but that should still be doable.
  • We should maintain a mapping of group and spec titles translations somewhere. Ideally, this would be done elsewhere and exposed via some other API. In practice, we'll probably have to do it on our own in JSON files.

Some things to do:

  • Create a first tag and make sure it's easy to retrieve the content of the roadmap
  • Add some "how to translate" documentation to the README
  • Create initial translations of spec titles and group names
  • Extract strings that need to be translated from the JS code
  • Adjust the JS code to fetch translations of spec titles and group names
@tidoust tidoust added the i18n label Sep 15, 2017
@tidoust
Copy link
Member Author

tidoust commented Sep 15, 2017

I created a first release (and tag) of the mobile roadmap through GitHub's interface. The content can be downloaded as an archive file, making it easy to translate the right set of HTML files.

@xfq
Copy link
Member

xfq commented Sep 18, 2017

If we add <base href="../"> to mobile/en/index.html, it will be overridden by generate-index.js, and assets referenced in template-index will be 404.

If we add <base href="../"> to template-index (and template-page), then other roadmaps will be affected as well.

@xfq
Copy link
Member

xfq commented Sep 18, 2017

Should we create language-specific JS files and template pages?

@tidoust
Copy link
Member Author

tidoust commented Sep 18, 2017

Good point. I think we should rather adjust the JS to preserve the <base> directive (we might also want to preserve other directives such as <link> and <style>, actually, so that authors may add custom styles in their roadmaps).

@r12a
Copy link

r12a commented Sep 27, 2017

In the end, we'll want to publish roadmaps on w3.org Web site, using content-negotiation. Resulting files would be named .html.zh.

I never really understood why people continue to use filename.html.zh rather than filename.zh.html, which makes many things work more easily.

+1 though to using content negotiation

For i18n articles we have all translations at the same level, with a filename-data directory for shared resources (including information about translation status for each translation). You'll be able to see that on the W3C server if you look at the source of an article such as https://www.w3.org/International/questions/qa-site-conneg. This makes it much easier imho to manage things. We use a .var file so that users just type and content authors use the file name without extensions and it routes the the right file. You could however use multiviews, in which case having translated files in the same directory is probably necessary.

@r12a
Copy link

r12a commented Sep 28, 2017

We should maintain a mapping of group and spec titles translations somewhere. Ideally, this would be done elsewhere and exposed via some other API. In practice, we'll probably have to do it on our own in JSON files.

fwiw, for i18n articles we do that here: https://www.w3.org/International/articlelist

That page is content negotiated itself, and shows translated titles where translations exist (click on the language links, top right, to see how it works). Just an idea.

I strongly recommend, btw, that every page with a translation carries a link to that translation, on the page itself (for cases where content negotiation doesn't do what's needed).

@tidoust
Copy link
Member Author

tidoust commented Sep 28, 2017

@r12a,

I never really understood why people continue to use filename.html.zh rather than 'filename.zh.html`, which makes many things work more easily.

Right, I suppose we can override existing content negotiation rules on w3.org Web site with custom .htaccess directives at the directory level.

For i18n articles we have all translations at the same level

OK, I personally prefer to use dedicated folders to avoid seeing apparent duplicates for each file in the tree explorer of my text editor, but other people may see things differently ;)

We should maintain a mapping of group and spec titles translations somewhere. Ideally, this would be done elsewhere and exposed via some other API. In practice, we'll probably have to do it on our own in JSON files.

fwiw, for i18n articles we do that here: https://www.w3.org/International/articlelist

Right, I was thinking that a simple HTML page could be easier to maintain for translators but then we don't really need an HTML page in our case and building the actual mapping table would require small but custom tools. We could perhaps leverage good ol' code, such as gettext and PO files here.

It seems to me that it would be useful to maintain such mappings for specification titles and group names independently of this repository in any case. Newsletters we send (e.g. in Chinese) already include translations, but these translations are done on an ad-hoc basis, and not necessarily consistent from one newsletter to the other.

Thanks for the comments!

@r12a
Copy link

r12a commented Sep 28, 2017

Right, I suppose we can override existing content negotiation rules on w3.org Web site with custom .htaccess directives at the directory level.

I could be wrong, but afaia existing content negotiation rules on w3.org work equally well either way. I don't remember having to change anything for the i18n setup when we used multiviews, and i think it's not relevant for .var files.

@tidoust
Copy link
Member Author

tidoust commented Sep 28, 2017

I could be wrong, but afaia existing content negotiation rules on w3.org work equally well either way. I don't remember having to change anything for the i18n setup when we used multiviews, and i think it's not relevant for .var files.

And you're most probably right! I always assumed that only the .html.xx syntax was allowed for some reason...

@xfq, I think I still prefer putting files in separate folders but, since you're working on the first translation, feel free to decide otherwise and put all files in the same folder, using the .xx.html naming convention!

@r12a
Copy link

r12a commented Sep 28, 2017

A couple more suggestions for consideration.

I think I still prefer putting files in separate folders

The answer to this question may depend on how you plan to do content negotiation. If you plan to use multiviews i'm not sure how the server would find the right file if the user types https:/example.org/filename. If you use type maps for content negotiation, the location is less important, but it's slightly more of a pain to create the .var file if the translations are in different directories.

Having the files in different directories, imo, also makes it more complicated when linking to either shared or localised images, script files, etc. For example, the translator would need to translate the URIs as well as the content. For the i18n articles we moved away from that at an early stage because it produced more validation work and more errors. Now the translator only needs to change a URL in the content on the rare occasion that they also localise an image.

@xfq
Copy link
Member

xfq commented Sep 29, 2017

Personally, I also prefer putting translations in separate directories. But if it makes the content negotiation config much harder, I can live with putting them in the same directory as the English source files.

@tidoust
Copy link
Member Author

tidoust commented Sep 29, 2017

The answer to this question may depend on how you plan to do content negotiation. If you plan to use multiviews i'm not sure how the server would find the right file if the user types https:/example.org/filename. If you use type maps for content negotiation, the location is less important, but it's slightly more of a pain to create the .var file if the translations are in different directories.

What we had in mind was to have a deploy script that would put files back into the same folder before they get published on w3.org.

Having the files in different directories, imo, also makes it more complicated when linking to either shared or localised images, script files, etc

Right, here we were planning to use a <base> tag to change the base URI of the document so that all links would continue to work.

All in all, the starting point was that we didn't want to have to work on documents named .html.xx because they would not be rendered as HTML documents by text editors and browsers by default. So we were ready to do extra work. Now, this discussion shows that this problem does not exist in practice since we can name documents .xx.html instead, so doing extra work just to keep the folder structure "clean" (for some definition of clean) is indeed probably not worth the hassle!

tidoust added a commit to tidoust/media-web-roadmap that referenced this issue Oct 20, 2017
Overhaul of the JS code to handle localized versions of roadmap documents.

As discussed in w3c#68, the code assumes that the following naming convention is
used for all localized files:

 [name].[lang].[ext]

English versions of the files are assumed to be named:

 [name].[ext]

To determine the language of the HTML page to render, the code checks the `lang`
attribute on the `<html>` tag. If it not set, it tries to extract the language
from the page name (`window.location.pathname`), using the above convention. If
if not set, English (`en`) is assumed.

The page tries to load localized versions of all files. If not found, it falls
back to the English versions. The code issues warnings on the console when it
cannot find localized versions of a given file (in some cases, we may not need
localized versions, so warnings may not be warranted. However they are useful
during development to understand what still needs to be translated).

Files that may (and often should!) be localized are:
1. `toc.json`: the labels, descriptions and URLs should be localized
2. `*.html`: the actual contents of the roadmap obviously need to be localized.
Note the need to update the `lang` attribute of the `<html>` tag, otherwise the
code will consider that the page is in English.
3. `js/template-table-*.html`: the HTML templates used for generated tables.
(These templates used to be hardcoded in `generate.js`)
4. `js/translations.[lang].json`: the translations of features, group names,
spec titles, and labels that are needed to render the roadmaps. The file is a
semi-flat mapping table between the English string and the translated string.
First level properties are: "features" for feature names that appear in data
files, "groupNames" for translations of group names, "specTitles" for
translations of spec titles, "labels" for all other translations.

For instance, the beginning of a `js/translations.fr.json` document could be:

```
{
  "labels": {
    "shipped": "C'est en prod!",
    "experimental": "On déboggue",
    "indevelopment": "On y travaille",
    "consideration": "On y réfléchit",
    "N/A": "Non renseigné",
    "in": "dans"
  },
  "features": {
    "audio element": "La balise audio",
    "picture element": "La balise picture"
  },
  "groupNames": {
    "CSS Working Group": "Le groupe de travail CSS"
  },
  "specTitles": {
    "Magnetometer": "Magnétomètre"
  }
}
```

The overhaul also adds support for merging features in generated tables when
they use the same "data-feature", even if they appear in different parts of the
documents (w3c#73).
@tidoust
Copy link
Member Author

tidoust commented Oct 20, 2017

I implemented some i18n logic in the JS code in PR #110. In the end, I followed the naming convention suggested by @r12a, and the code assumes that this naming convention is being used.

One thing that continues to bug me: it seems very convenient to be able to retain HTML files without language indication, typically index.html so that https://w3c.github.io/web-roadmaps/mobile/ returns the English version. However, this prevents content negotiation in practice, because we would then need index.html to mean "the localized version that matches the user's language" and thus the English version would need to be in an index.en.html.

I do not know if there is an easy way to do that on pages published by GitHub. Through Jekyll perhaps. In the meantime, the code I prepared follows a "file without language indication is the English version" rule, and thus content negotiation is not yet doable.

(FYI, the code I prepared does not yet list and add links to available translations either)

@tidoust
Copy link
Member Author

tidoust commented Apr 26, 2018

I created #248 to keep track of the content negotiation. I'm closing this issue because the rest should now be done.

@tidoust tidoust closed this as completed Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants