DOC, STET, and Passages update #247

linearcombination · 2025-04-02T20:06:23Z

DOC:

Deeper localization for book names and chapter names
Optional chapter names or no chapter names (just chapter numbers) - selectable on Settings page of UI
More consistent formatting across heterogeneous source USFM
Better performance through:
- Reorganizing workload amongst Celery background tasks so that they can use the same shared volumes for assets which allows for more caching to occur
- More aggressive caching in a few more/new places
- Batching of git clone calls to one cli process rather than a forked process for every clone. This saves on resource allocation and teardown for all those OS level processes and thus should improve memory use
Better module organization for consistency amongst DOC, STET, and Passages apps
Fix bug in 1st page title when multiple books chosen
See git logs for more

STET:

Fixes all known issues and feature requests for STET as reported by PO
Handle bolding of multiple supplied words in source text. Words supplied in 4th column of STET input doc.

Passages:

Adds ability to select (all at once) to add all NT Survey Reviewer's Guide passages in UI as requested by PO

Languages whose repos are laid out one directory per chapter get deeper localization of book names and chapter names with this update.

TS will now optionally provide an additional column where the words that should be bolded in the source text are listed.

abu language has repeated USFM chapter markers

Added column containing words to bold

DOC, STET, and Passages should be at the same level and separate

Put DOC, STET, and Passages modules at same level

So that is doesn't clash with system package by same name

Not currently using focus mark on these tests

First language title for display on first page of generated document only included the first book of potentially many chosen. Fixed.

To clarify that failure was intended

This is important because in the passages app, the user might select to add NT Survey Reviewer's Guide passages en masse more than once in the UI and we would want the passages to be duplicated

In case we want to serialize to JSON a full RGBook in the future using FastAPI/Pydantic auto-encoding.

USFM layout in repo comes in mostly two forms: - one file per book, e.g., 1-GEN.usfm, or - one directory per chapter, with one file per verse span in that directory (which we pull together into one file per book) The latter layout projects a little more localization using <root>/front/title.txt and <root>/<chapter>/title.txt files which in turn results in often different (more complete) use of USFM chapter labels. Because of this, the two types have to be handled differently when splitting into chapters as the regex that is splitting hinges on is different for each case.

Requested by TS team

Also update a few python and node packages

Working to enforce consistency in generated USFM whether that USFM is the one file per book variety or the file per verse variety.

This is so because worker threads in Celery in a Docker environment have their own copy of the container including its file system. If you first request assets, via git, in a non-worker thread and then subsequently have to request them in a later step, e.g., get languages, get books, if those actions do not share the same filesystem, then you will have to clone the repos again whereas if they are both worker threads they will execute in the copy of the container used by the worker and thus git will not have to clone the repos since they were already acquired in an earlier step in the UI.

This plays better with the splitting algo by simplifying it. Now getting consistent localized chapter labels when USE_CHAPTER_LABELS is True and when USE_LOCALIZED_CHAPTER_LABEL is True.

Match PDF layout

Because docx layout is configurable, there is currently no consistent docx styled element that a template header could count on being there. This would mean that when such an element is not there, you get an error in the header instead. Eventually maybe we will have dynamic header logic for docx.

Same reason as given in the commit log for the other docx template two commits ago

Some languages, e.g., byn-reg-dan, have incorrect information in their <chapter_dir>/title.txt files: \c 1, which should have been the text (sans USFM marker) for the chapter label.

Added the ability to toggle chapter labels in DOC UI under a new optional settings area.

This should help performance both in terms of speed and in memory use because it is costly in Python to spin up a process for every repo clone we need to do when instead we can batch them into one cli call, i.e., into one process dispatch.

github actions server is slooooow

linearcombination added 30 commits February 27, 2025 10:22

Deeper localization for book names and chapter names

9cfeb92

Languages whose repos are laid out one directory per chapter get deeper localization of book names and chapter names with this update.

WIP for improved bolding in STET

42d9c1b

TS will now optionally provide an additional column where the words that should be bolded in the source text are listed.

Remove unused import

eb7e1e5

Better progress indicator messages

eb878de

Minor (automatic) code formatting

974f743

Deeper localization for book names and chapter names

72fc875

A bit of refactoring to share common code

44be4a3

Move common code to shared function

5343f0d

Add abu to languages with USFM defects

a7c25d4

abu language has repeated USFM chapter markers

Updated stet_en.docx input document for STET

53ece64

Added column containing words to bold

Better module organization

e68fb4e

DOC, STET, and Passages should be at the same level and separate

Better module organization and naming

ee68a18

Put DOC, STET, and Passages modules at same level

Move reviewer's guide tests into own module

c60032c

Change module name

733d316

So that is doesn't clash with system package by same name

Remove some pytest marks

c72deed

Not currently using focus mark on these tests

Sort imports

c05667a

Fix bug in function and improve organization

a65a027

First language title for display on first page of generated document only included the first book of potentially many chosen. Fixed.

Improve source comment

52affa8

To clarify that failure was intended

Updates to logging output

3f2a09f

Remove unused code and comments

4e76be9

Only add passage if not already in store

b24bf7b

This is important because in the passages app, the user might select to add NT Survey Reviewer's Guide passages en masse more than once in the UI and we would want the passages to be duplicated

Switch to Pydantic BaseModel as subclass for RG models

9b87b52

In case we want to serialize to JSON a full RGBook in the future using FastAPI/Pydantic auto-encoding.

Improve source code comment

a1cb893

Source code formatting

78b7648

Tighten up types on function

4dc6b00

Micro refactoring

4cd4f7f

Add ability to add all NT Survey RG passages at once to Passages app

9ce8eb8

Requested by TS team

Update doctype tests

ad08edd

Update docker containers for python, node, nginx

6dd58c2

Also update a few python and node packages

linearcombination added 22 commits March 25, 2025 17:15

Small improvements

68316a7

Add a heart language with USFM defects to list

c5988ef

Tweaks to chapter splitting and chapter labels and markers

520b246

Working to enforce consistency in generated USFM whether that USFM is the one file per book variety or the file per verse variety.

Put USFM chapter markers before chapter labels

c8998c7

This plays better with the splitting algo by simplifying it. Now getting consistent localized chapter labels when USE_CHAPTER_LABELS is True and when USE_LOCALIZED_CHAPTER_LABEL is True.

A another bem book with USFM defects

976dc48

Remove old commented out code

48702a3

Use function params again rather than magic strings

ed8d6ca

Remove old source comment

09d084d

Add book name at beginning of each chapter in two language docx layout

45762ed

Match PDF layout

Add a couple source comments

3af7059

Remove header from compact docx template

7ae55d0

Same reason as given in the commit log for the other docx template two commits ago

Fix some failing tests

ec2b0f0

Innoculate against improper chapter label input data

4853f23

Some languages, e.g., byn-reg-dan, have incorrect information in their <chapter_dir>/title.txt files: \c 1, which should have been the text (sans USFM marker) for the chapter label.

Make it possible to toggle chapter labels through UI

27138e1

Added the ability to toggle chapter labels in DOC UI under a new optional settings area.

Batch all git clones together into one process

f6f91d2

This should help performance both in terms of speed and in memory use because it is costly in Python to spin up a process for every repo clone we need to do when instead we can batch them into one cli call, i.e., into one process dispatch.

Comment out several debug statements

5f354d8

Remove obselete source code comment

65c5dcb

Include chapter num when creating chapter label

b2c7bea

Increase test timeout for github actions server

868fafa

github actions server is slooooow

Tidy up source code comments

e200152

linearcombination requested a review from PurpleGuitar April 2, 2025 20:06

linearcombination added this to DOC Project Management Apr 4, 2025

github-project-automation bot moved this to In progress in DOC Project Management Apr 4, 2025

Update STET input docs provided by PO

68bf1e7

PurpleGuitar approved these changes Apr 4, 2025

View reviewed changes

PurpleGuitar merged commit 6bb6c52 into doc-dev.walink.org Apr 4, 2025
6 of 7 checks passed

github-project-automation bot moved this from In progress to Done in DOC Project Management Apr 4, 2025

PurpleGuitar deleted the better-localized-book-names-and-chapter-names branch April 4, 2025 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC, STET, and Passages update #247

DOC, STET, and Passages update #247

Uh oh!

linearcombination commented Apr 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DOC, STET, and Passages update #247

DOC, STET, and Passages update #247

Uh oh!

Conversation

linearcombination commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linearcombination commented Apr 2, 2025 •

edited

Loading