Update 8 19 2025 #273

linearcombination · 2025-08-20T18:26:38Z

This is a rather large PR. I can split it up if you like into smaller PRs, but if it isn't too much trouble to review I submit it in its current form just to save time splitting it up. Each commit message is carefully crafted, so hopefully each commit stands on its own clearly. I really prefer (for your sake) to use smaller PRs. This time the PRs, if separated, would be stacked and the review merge cycle would take many days as each depends on the other in some way with a few exceptions. Again, I am glad to split it if needed. I can cherry pick into different branches and give a succession of PRs one right after the other over several days with a few that could be simultaneous. Here are some of the notable things this PR includes:

Fix for search by lang code in STET
Add French STET input doc and associated localized phrases for output doc and supporting test
Fix for bug where translation words could be repeated in the translation words section.
Add a (much) superior PDF option (chosen through UI) as an option (PrinceXML under non-commercial license - feature is crafted to comport with their license) while retaining the existing option. Please try this when deployed and see what you think. It is amazingly faster and has much better layout at the cost of a small logo on the first page of the PDF output. It can easily handle the whole bible plus helps in many languages.
Fix missing fonts in many languages through recrafted CSS directives and font libraries
A lot of QA for languages that have not been exercised much until now (this led to much of the above)
Ability to choose different layouts in the UI for TN and TQ resources (as in particular some render well in one layout or the other and do not in the other - two or one column). E.g., Khmer did not render well in our (prior) default layout of two column, but is fine in one column. The feature includes a small informational help button to guide the user slightly.
Display books for languages which have no USFM resource, but have some other resource, e.g., TN, TQ, etc. This was a regression from a previous refactor that did not have a automated test associated with it. Use of gpt help makes these types of regressions more possible while at the same time being a multiplier. Exercising more care to cover with even more testing of key specifications.
Several under the hood refactorings where they were needed along with tests to ensure desired behavior
Update list of language/resource/book combos that we provide runtime USFM structural fixes for to comport with changes in data coming from the data API
Handle case where data API may return null resource type leading to failure of loading and thus DOC/STET/Passages website. It will now exclude such cases from consideration at the cost of hiding the flaw in data but keep the sites running. Discovery of such data problems will now happen by noticing that a particular language is missing that was thought to be provided followed by investigation of data upstream. Validation was nice in that it caught this problem immediately, but it was too disruptive. Validation still occurs for all other aspects of data reification excepting this one since it "stopped the show". That said, there could be other cases that we will discover one day.
Cache data API results for a half hour. This greatly speeds up the three apps and isn't a long enough duration to cause staleness issues when others are testing.
Dynamic headers have been added to PDF output showing book name and chapter, top center and top right. Work still to come for same on DOCX.
Adjustment to acquisition of data to support localization of book names in a deeper way. It is doing more work and accessing more known locations for this data in an order which is thought to be best: 1) USFM metadata, 2) repo manifest, 3) front/title.txt, 4) lastly, English sources that always exist. During QA this deeper seek for localization data has resulted in much improved localization of book names for many languages.

The issue fixed is that some languages have book codes because they have non USFM resources associated with those books, but they do not have ny USFM, we still need to show the books available from these non-USFM resources. This was a regression from a few months ago that I didn't discover until testing zh language.

User can choose to use the faster and better HTML to PDF converter PrinceXml as long as they are OK with it putting a PrinceXml logo on the first page of the PDF (as per PrinceXml non-commercial license). * Default to not using PrinceXML for HTML to PDF conversion The assumption here (which may change later) is that users won't want the Prince PDF logo on the first page of the PDF (which is required by the non-commercial license). * Add backend test for using princexml * Update frontend tests and add a frontend test for princexml

This is a questionable change because it means that we don't catch invalid data in the data API as quickly for the case when content.resource_type is null (which is an invalid state). However, without this commit, if the data API returns content.resource_type null ever again, it zombies this app and others that depend on it as an API which is bad for users. After this commit this type of error will only be detected by a DOC user if they were expecting to see the offending record and it isn't present in the results.

This really speeds up DOC (and thus STET, and Passages apps). The cache duration is set to 3 minutes to approximately cover one user interaction span which greatly improves the perceived speed of the app(s).

Self documenting

Forgot to commit this an update or two ago while back

Also add conditional display of settings based on resources user has chosen

Some changes in the data API make it possible to bring some backend tests back online.

Put bible book name at top center and chapter and chapter number at top right.

It occurred with zh language at one point, but may have been transient.

This occurred once for ont language and ulb resource, but later rectified, so may have been data API related as it seemed to right itself later.

Dzongha (bo) language precipitated this in that it wouldn't render the proper font in PDF. This commit solves this and should solve many others like it.

Changes in results returned by data API

Add test for search by lang_code in STET

PurpleGuitar · 2025-08-21T20:34:25Z

Hi Lang, I do prefer smaller PRs in general, but no need to break this one up.

It will probably take a little time for me to get through it, but I'm looking forward to reviewing it.

Thanks for your hard work!

PurpleGuitar · 2025-08-26T19:24:07Z

backend/doc/domain/resource_lookup.py

-    (...)
-    >>> result[0]
-    ('gen', 'Gênesis')
+    >>> from doc.domain.resource_lookup import book_codes_for_lang


I always enjoy seeing Python doctests. They're such an elegant way of expressing the function's contract.

PurpleGuitar · 2025-08-26T19:29:11Z

backend/doc/utils/text_utils.py

+}
+
+
 def normalize_localized_book_name(localized_book_name: str) -> str:


This is a clever way to be generous in how we receive book names from the field.

linearcombination added 30 commits July 17, 2025 14:39

Reuse common code

938fd3c

Handle Burmese, my, irregularity

df9bbef

Remove unused import

23a1156

Modify loading string

a19b7f3

Add timing for HTML to DOCX conversion to log output

1e74a51

Remove commented out code

4774cf1

Improve source code comment

3df93db

Refactor large function continued

b98237d

Remove unneeded cmd

b75f824

Update a few source code comments

cc79360

Update doctests so that they all pass again

76f8cc7

Remove unused constants

ef834af

Handle validation error exception

f707be3

Simplify batch_clone_git_repos function

cd51792

Update doctest result

2525d5a

Make sure to declare variable for case of failure

9762704

Cache returns from our oft used but singular query to data API

183e05e

This really speeds up DOC (and thus STET, and Passages apps). The cache duration is set to 3 minutes to approximately cover one user interaction span which greatly improves the perceived speed of the app(s).

Small refactor

b15e61b

Self documenting

Update source code comments

1eb2d3b

Fix bug where multiple copies of a translation word could appear

2d2419e

Add test that ensures ordering of USFM content

9c11e4d

Forgot to commit this an update or two ago while back

Move shared constant to Settings

c2b3f88

Move shared constant to Settings

fdf4d55

Give the user a choice of 1 or 2 col layout for TN and TQ

d2d71ac

Also add conditional display of settings based on resources user has chosen

Rename function

e2aa9cf

Update tests

4cd041a

Some changes in the data API make it possible to bring some backend tests back online.

Dynamic header in PDF document

d7b90f7

Put bible book name at top center and chapter and chapter number at top right.

linearcombination added 23 commits August 13, 2025 09:04

Remove old commented out code

0a76f72

Handle exception that can rarely occur

e8feba2

It occurred with zh language at one point, but may have been transient.

Handle rare case that was encountered transiently

327a775

This occurred once for ont language and ulb resource, but later rectified, so may have been data API related as it seemed to right itself later.

Simplify large function by breaking it down

2dad217

Move doctest tests to better location

da9753e

Add a couple source code comments

7096130

Handle fonts for more languages

8d5a43b

Dzongha (bo) language precipitated this in that it wouldn't render the proper font in PDF. This commit solves this and should solve many others like it.

Automatic white space formatting via prettier

29615a1

Do a better job of localizing book names

463be48

Update doctests

53146c6

Update list of languages and books with USFM defects

e533bf3

Changes in results returned by data API

Remove unused import

465a00d

Automatically organize imports

a59d03a

Remove obselete source code comment

52e898c

Add docstring

ee88630

Fix for search by lang_code in stet

d1078a1

Add another unit test for usfm fixes

79a559a

Comment out pytest marker for one test

c6224d2

Remove some old commented out code

fca16ef

Update list of language/book combos needing USFM fixing

81a23e7

Add French STET input document

764b5dd

Add test for search by lang_code in STET

Update frontend tests and reformat to 2 spaces indentation

34850bf

Add test for burmese

8988bcb

linearcombination requested a review from PurpleGuitar August 20, 2025 18:26

linearcombination added this to DOC Project Management Aug 20, 2025

github-project-automation bot moved this to In progress in DOC Project Management Aug 20, 2025

PurpleGuitar approved these changes Aug 26, 2025

View reviewed changes

PurpleGuitar merged commit 7998ee8 into doc-dev.walink.org Aug 26, 2025
15 checks passed

github-project-automation bot moved this from In progress to Done in DOC Project Management Aug 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update 8 19 2025 #273

Update 8 19 2025 #273

Uh oh!

linearcombination commented Aug 20, 2025

Uh oh!

PurpleGuitar commented Aug 21, 2025

Uh oh!

PurpleGuitar Aug 26, 2025

Uh oh!

PurpleGuitar Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		}


		def normalize_localized_book_name(localized_book_name: str) -> str:

Update 8 19 2025 #273

Update 8 19 2025 #273

Uh oh!

Conversation

linearcombination commented Aug 20, 2025

Uh oh!

PurpleGuitar commented Aug 21, 2025

Uh oh!

PurpleGuitar Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

PurpleGuitar Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants