Skip to content

Conversation

@linearcombination
Copy link
Contributor

@linearcombination linearcombination commented Apr 2, 2025

DOC:

  • Deeper localization for book names and chapter names
  • Optional chapter names or no chapter names (just chapter numbers) - selectable on Settings page of UI
  • More consistent formatting across heterogeneous source USFM
  • Better performance through:
    • Reorganizing workload amongst Celery background tasks so that they can use the same shared volumes for assets which allows for more caching to occur
    • More aggressive caching in a few more/new places
    • Batching of git clone calls to one cli process rather than a forked process for every clone. This saves on resource allocation and teardown for all those OS level processes and thus should improve memory use
  • Better module organization for consistency amongst DOC, STET, and Passages apps
  • Fix bug in 1st page title when multiple books chosen
  • See git logs for more

STET:

  • Fixes all known issues and feature requests for STET as reported by PO
  • Handle bolding of multiple supplied words in source text. Words supplied in 4th column of STET input doc.

Passages:

  • Adds ability to select (all at once) to add all NT Survey Reviewer's Guide passages in UI as requested by PO

Languages whose repos are laid out one directory per chapter get
deeper localization of book names and chapter names with this update.
TS will now optionally provide an additional column where the words
that should be bolded in the source text are listed.
abu language has repeated USFM chapter markers
Added column containing words to bold
DOC, STET, and Passages should be at the same level and separate
Put DOC, STET, and Passages modules at same level
So that is doesn't clash with system package by same name
Not currently using focus mark on these tests
First language title for display on first page of generated document
only included the first book of potentially many chosen. Fixed.
To clarify that failure was intended
This is important because in the passages app, the user might select
to add NT Survey Reviewer's Guide passages en masse more than once in
the UI and we would want the passages to be duplicated
In case we want to serialize to JSON a full RGBook in the future using
FastAPI/Pydantic auto-encoding.
USFM layout in repo comes in mostly two forms:

- one file per book, e.g., 1-GEN.usfm, or
- one directory per chapter, with one file per verse span in that
  directory (which we pull together into one file per book)

The latter layout projects a little more localization using
<root>/front/title.txt and <root>/<chapter>/title.txt files which in
turn results in often different (more complete) use of USFM chapter
labels. Because of this, the two types have to be handled differently
when splitting into chapters as the regex that is splitting hinges on
is different for each case.
Also update a few python and node packages
Working to enforce consistency in generated USFM whether that USFM is
the one file per book variety or the file per verse variety.
This is so because worker threads in Celery in a Docker environment
have their own copy of the container including its file system. If you
first request assets, via git, in a non-worker thread and then
subsequently have to request them in a later step, e.g., get
languages, get books, if those actions do not share the same
filesystem, then you will have to clone the repos again whereas if
they are both worker threads they will execute in the copy of the
container used by the worker and thus git will not have to clone the
repos since they were already acquired in an earlier step in the UI.
This plays better with the splitting algo by simplifying it. Now
getting consistent localized chapter labels when USE_CHAPTER_LABELS is
True and when USE_LOCALIZED_CHAPTER_LABEL is True.
Because docx layout is configurable, there is currently no consistent
docx styled element that a template header could count on being there.
This would mean that when such an element is not there, you get an
error in the header instead. Eventually maybe we will have dynamic
header logic for docx.
Same reason as given in the commit log for the other docx template two
commits ago
Some languages, e.g., byn-reg-dan, have incorrect information in their
<chapter_dir>/title.txt files: \c 1, which should have been the
text (sans USFM marker) for the chapter label.
Added the ability to toggle chapter labels in DOC UI under a new
optional settings area.
This should help performance both in terms of speed and in memory use
because it is costly in Python to spin up a process for every repo
clone we need to do when instead we can batch them into one cli call,
i.e., into one process dispatch.
github actions server is slooooow
@PurpleGuitar PurpleGuitar merged commit 6bb6c52 into doc-dev.walink.org Apr 4, 2025
6 of 7 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in DOC Project Management Apr 4, 2025
@PurpleGuitar PurpleGuitar deleted the better-localized-book-names-and-chapter-names branch April 4, 2025 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants