Skip to content

Conversation

@linearcombination
Copy link
Contributor

This is a rather large PR. I can split it up if you like into smaller PRs, but if it isn't too much trouble to review I submit it in its current form just to save time splitting it up. Each commit message is carefully crafted, so hopefully each commit stands on its own clearly. I really prefer (for your sake) to use smaller PRs. This time the PRs, if separated, would be stacked and the review merge cycle would take many days as each depends on the other in some way with a few exceptions. Again, I am glad to split it if needed. I can cherry pick into different branches and give a succession of PRs one right after the other over several days with a few that could be simultaneous. Here are some of the notable things this PR includes:

  • Fix for search by lang code in STET
  • Add French STET input doc and associated localized phrases for output doc and supporting test
  • Fix for bug where translation words could be repeated in the translation words section.
  • Add a (much) superior PDF option (chosen through UI) as an option (PrinceXML under non-commercial license - feature is crafted to comport with their license) while retaining the existing option. Please try this when deployed and see what you think. It is amazingly faster and has much better layout at the cost of a small logo on the first page of the PDF output. It can easily handle the whole bible plus helps in many languages.
  • Fix missing fonts in many languages through recrafted CSS directives and font libraries
  • A lot of QA for languages that have not been exercised much until now (this led to much of the above)
  • Ability to choose different layouts in the UI for TN and TQ resources (as in particular some render well in one layout or the other and do not in the other - two or one column). E.g., Khmer did not render well in our (prior) default layout of two column, but is fine in one column. The feature includes a small informational help button to guide the user slightly.
  • Display books for languages which have no USFM resource, but have some other resource, e.g., TN, TQ, etc. This was a regression from a previous refactor that did not have a automated test associated with it. Use of gpt help makes these types of regressions more possible while at the same time being a multiplier. Exercising more care to cover with even more testing of key specifications.
  • Several under the hood refactorings where they were needed along with tests to ensure desired behavior
  • Update list of language/resource/book combos that we provide runtime USFM structural fixes for to comport with changes in data coming from the data API
  • Handle case where data API may return null resource type leading to failure of loading and thus DOC/STET/Passages website. It will now exclude such cases from consideration at the cost of hiding the flaw in data but keep the sites running. Discovery of such data problems will now happen by noticing that a particular language is missing that was thought to be provided followed by investigation of data upstream. Validation was nice in that it caught this problem immediately, but it was too disruptive. Validation still occurs for all other aspects of data reification excepting this one since it "stopped the show". That said, there could be other cases that we will discover one day.
  • Cache data API results for a half hour. This greatly speeds up the three apps and isn't a long enough duration to cause staleness issues when others are testing.
  • Dynamic headers have been added to PDF output showing book name and chapter, top center and top right. Work still to come for same on DOCX.
  • Adjustment to acquisition of data to support localization of book names in a deeper way. It is doing more work and accessing more known locations for this data in an order which is thought to be best: 1) USFM metadata, 2) repo manifest, 3) front/title.txt, 4) lastly, English sources that always exist. During QA this deeper seek for localization data has resulted in much improved localization of book names for many languages.

The issue fixed is that some languages have book codes because they
have non USFM resources associated with those books, but they do not
have ny USFM, we still need to show the books available from these
non-USFM resources. This was a regression from a few months ago that I
didn't discover until testing zh language.
User can choose to use the faster and better HTML to PDF converter
PrinceXml as long as they are OK with it putting a PrinceXml logo on
the first page of the PDF (as per PrinceXml non-commercial license).

* Default to not using PrinceXML for HTML to PDF conversion

  The assumption here (which may change later) is that users won't want
  the Prince PDF logo on the first page of the PDF (which is required by
  the non-commercial license).

* Add backend test for using princexml

* Update frontend tests and add a frontend test for princexml
This is a questionable change because it means that we don't catch
invalid data in the data API as quickly for the case when
content.resource_type is null (which is an invalid state). However,
without this commit, if the data API returns content.resource_type
null ever again, it zombies this app and others that depend on it as
an API which is bad for users. After this commit this type of error
will only be detected by a DOC user if they were expecting to see the
offending record and it isn't present in the results.
This really speeds up DOC (and thus STET, and Passages apps). The
cache duration is set to 3 minutes to approximately cover one user
interaction span which greatly improves the perceived speed of the
app(s).
Self documenting
Forgot to commit this an update or two ago while back
Also add conditional display of settings based on resources user has chosen
Some changes in the data API make it possible to bring some backend
tests back online.
Put bible book name at top center and chapter and chapter number at
top right.
It occurred with zh language at one point, but may have been
transient.
This occurred once for ont language and ulb resource, but later
rectified, so may have been data API related as it seemed to right
itself later.
Dzongha (bo) language precipitated this in that it wouldn't render the
proper font in PDF. This commit solves this and should solve many
others like it.
Changes in results returned by data API
Add test for search by lang_code in STET
@PurpleGuitar
Copy link
Contributor

Hi Lang, I do prefer smaller PRs in general, but no need to break this one up.

It will probably take a little time for me to get through it, but I'm looking forward to reviewing it.

Thanks for your hard work!

(...)
>>> result[0]
('gen', 'Gênesis')
>>> from doc.domain.resource_lookup import book_codes_for_lang
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always enjoy seeing Python doctests. They're such an elegant way of expressing the function's contract.

}


def normalize_localized_book_name(localized_book_name: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clever way to be generous in how we receive book names from the field.

@PurpleGuitar PurpleGuitar merged commit 7998ee8 into doc-dev.walink.org Aug 26, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in DOC Project Management Aug 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants