Skip to content

add functions to build html reference manuals #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 29, 2025
Merged

add functions to build html reference manuals #15

merged 10 commits into from
May 29, 2025

Conversation

LiNk-NY
Copy link
Contributor

@LiNk-NY LiNk-NY commented Nov 6, 2024

This is a work in progress. The functions are working but I am not sure how to make them run on the builders.

See here for the local setup step for testing:

https://github.com/r-devel/repos/blob/main/R/minibioc.R

cc: @jwokaty @hpages

@jwokaty
Copy link
Contributor

jwokaty commented Nov 6, 2024

I will take a look at it tomorrow. I think the BBS has some Python functions that can wrap around R functions.

@jwokaty
Copy link
Contributor

jwokaty commented Nov 11, 2024

While I can see that build_html_mans produces the manual with links, it's not clear to me how the other functions would be used with the BBS. Could you give a little more context?

For build_html_mans, I see packages_dirs is a list of paths to the repositories of packages and each path is passed to tools::pkg2HTML. Currently, the BBS uses biocViews::extractManuals to get the manuals from the source tarballs in a CRAN-style repository that will be propagated. Looking at tools::pkg2HTML, it sounds like there might be an option to get it from a tarball? I might be misunderstanding the documentation and maybe this is what is meant by 'experimental'. I couldn't seem to make this work on a tarball. This desirable because I don't have to change much when/where things happen in the BBS, but if that's not possible, it's okay.

Is src_base is the same as repoRoot that is referred in other parts of biocViews? It's the path of the top of the CRAN-style repository. I might recommend the same language.

I haven't looked too deeply but I did try to follow the examples, which worked for me.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Nov 12, 2024

Hi Jen, @jwokaty

I think copilot did a pretty good job at summarizing the code:

The other functions in build_html_mans.R are used to create and update databases of aliases and cross-references for R packages, which can be helpful for creating searchable documentation and reference materials. Here's a brief overview of how each function fits into the overall process:

  1. build_db_from_source:

    • This function generates the aliases.rds and rdxrefs.rds files from the source files of an R package. It reads the documentation files (.Rd files) in the package directory and extracts aliases (alternative names) and cross-references.
    • These .rds files are then saved to the web directory of the package repository, making them available for other functions to use.
  2. build_meta_aliases_db:

    • This function creates or updates a meta-database of aliases for all packages in the specified web directory.
    • It reads the aliases.rds files generated by build_db_from_source and combines them into a single meta-database file (aliases_db_file).
    • The force parameter determines whether to update only the entries with newer aliases.rds files or to rebuild the entire database.
  3. build_meta_rdxrefs_db:

    • Similar to build_meta_aliases_db, this function creates or updates a meta-database of cross-references (rdxrefs.rds files) for all packages in the web directory.
    • It reads the rdxrefs.rds files generated by build_db_from_source and combines them into a single meta-database file (rdxrefs_db_file).
    • The force parameter works the same way, determining whether to update only the entries with newer rdxrefs.rds files or to rebuild the entire database.
  4. build_html_mans:

    • This function generates HTML manuals from the source directories of R packages, which are then saved to the specified directory.
    • It uses the pkg2HTML function to convert the documentation files into HTML format, making them easily accessible and readable online.

End copilot.

I couldn't seem to make this work on a tarball. This desirable because I don't have to change much when/where things happen in the BBS, but if that's not possible, it's okay.

This worked form me on a package tarball:

> tools::pkg2HTML("AnVIL_1.19.3.tar.gz")
Warning message:
In y[i] <- if (is.null(value)) NULL else as.person(value) :
  number of items to replace is not a multiple of replacement length
> file.exists("AnVIL.html")
[1] TRUE

What are the advantages of running it on the tarball? I think running the code on the source package directory allows one to run it at any time (e.g., after the package is updated). The tarballs are dependent on the build process. It may also take more time to untar and parse the tarballs than to work on the source directories.

Is src_base is the same as repoRoot that is referred in other parts of biocViews? It's the path of the top of the CRAN-style repository. I might recommend the same language.

I don't see where repoRoot is mentioned in the package. Can you point to the code that uses repoRoot?

Yes, it is the first part of the full package URL, e.g.,
https://bioconductor.org/packages/release/bioc/ in
https://bioconductor.org/packages/release/bioc/src/contrib/MultiAssayExperiment_1.32.0.tar.gz

Best regards,
Marcel

@jwokaty
Copy link
Contributor

jwokaty commented Nov 18, 2024

The discussion last week helped clarify a lot of questions, including getting html from source versus tarballs. We'll get the HTML from source.

I'll include the other clarifications/todos here:

  • all aliases.rds should be combined and placed in src/contrib/Meta following CRAN's example: https://cran.r-project.org/src/contrib/Meta/. rdxrefs.rds files should be handled the same way.
  • These two files should be available when generating the HTML files.
  • It's still not clear if we should do force = TRUE. I think yes unless the amount of time to do it is significant.
  • HTML manuals will not necessary replace PDF manuals but may appear alongside them. (Need to follow up to determine how to make available on bioconductor.org.)

I misspelled: it's called reposRoot. See

Establish a top-level directory for the repository, we will refer to this
directory as reposRoot. Place your packages as follows:
.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Nov 19, 2024

including getting html from source versus tarballs.

It would be less disruptive to generate from the source directories but may not be in sync with the builds. OTOH, generating from tarballs will be in sync with the builds but will have to be run after the tarballs are generated. Feel free to chime in Hervé @hpages.

  • These two files should be available when generating the HTML files.

Not necessarily. These files are for CRAN to inspect / merge into their DB for linking to Bioconductor cross-references.

  • It's still not clear if we should do force = TRUE. I think yes unless the amount of time to do it is significant.

By default, the Rds files will be updated when the mtime is more recent than that of the Rds files. I think we should keep this default behavior.

  • (Need to follow up to determine how to make available on bioconductor.org.)

This will likely be another cell in the table on the landing page, right next to the pdf link.

I misspelled: it's called reposRoot. See

I have changed src_destDir to reposRoot.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Nov 19, 2024

Note. I've created a PR to update the landing pages with the HTML links:

Bioconductor/bioconductor.org#299

@jwokaty
Copy link
Contributor

jwokaty commented Nov 19, 2024

For the BBS builds the final repository structure is assembled close to the propagation step when it's easier to point to tarballs vs source repositories. I will work with whatever you make available.

I misunderstood what you said about the 2 RDS files. Great, they don't need to be available for generating HTML. Just put them in the Meta directory. And we should use the defaults for build_meta_*_db. Got it.

@LiNk-NY LiNk-NY marked this pull request as ready for review November 20, 2024 18:59
@jwokaty
Copy link
Contributor

jwokaty commented Jan 30, 2025

@LiNk-NY I am looking at this again as I am thinking about how to incorporate with the BBS. Why do the resulting files from build_html_mans follow the structure manuals/packageName/man/packageName.html? When I look at R Universe, they follow packageName/doc/manual.html ; for example, https://bioc.r-universe.dev/GenomicRanges/doc/manual.html. If we could choose the structure, I might lean toward the R Universe example.

When I generated the HTML manuals, the links to other packages don't work in the current structure (for example, I'm on the SummarizedExperiment manual and click a link to GenomicRanges) because it expects a path like ../manuals/SummarizedExperiment/man/GenomicRanges.html#topic+GenomicRanges . I think we need to pass hooks to tools::pkg2HTML with a list containing a function to create the structure of the outfiles to write the links following our outfiles structure:

# this is the default: hooks = list(pkg_href = function(pkg) sprintf("%s.html", pkg))
tools::pkg2HTML(
  dir = package_dirs[i],
  hooks = list(pkg_href = function(pkg) sprintf("%s/doc/manual.html", pkg)) # R Universe style
  out = outfiles[i]
)

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Jan 31, 2025

@jwokaty

Why do the resulting files from build_html_mans follow the structure manuals/packageName/man/packageName.html?

Because the PDF manuals are in that location. And both the PDF and HTML pages should be in the same location:
For example,
https://bioconductor.org/packages/release/bioc/manuals/RaggedExperiment/man/RaggedExperiment.pdf

When I generated the HTML manuals, the links to other packages don't work in the current structure (for example, I'm on the SummarizedExperiment manual and click a link to GenomicRanges) because it expects a path like ../manuals/SummarizedExperiment/man/GenomicRanges.html#topic+GenomicRanges

Are you running the example code?
This works on my setup:
https://www.loom.com/share/41be53b7fe26409e8c9307256b75a82d?sid=fed713b4-d5f8-4977-b9a6-f5743924998d

It makes more sense to follow our own template than that of r-universe.

The address bar was not recorded in the video but the pages are properly linked.
minibioc/packages/3.20/bioc/manuals/SummarizedExperiment.html
redirects to minibioc/packages/3.20/bioc/manuals/GenomicRanges.html#topic+GenomicRanges
when clicking on the GenomicRanges link.

@jwokaty
Copy link
Contributor

jwokaty commented Jan 31, 2025

Yeah, I looked at your other PR for bioconductor.org then it made sense why you selected that path. I altered your example because I couldn't exactly get your minibioc working, so I changed the paths after I figured out what inputs seemed to be needed. You can see some of the input/output to help you understand what I'm seeing.

> bioc_sub <- list.dirs("repos", full.names = FALSE, recursive = FALSE)
> bioc_sub
[1] "Biobase"              "BiocBaseUtils"        "BiocGenerics"        
[4] "DelayedArray"         "GenomicRanges"        "IRanges"             
[7] "S4Vectors"            "SummarizedExperiment"
> packages <- file.path(normalizePath("repos"), bioc_sub)
> packages
[1] "/home/fm/Work/sandbox/repos/Biobase"             
[2] "/home/fm/Work/sandbox/repos/BiocBaseUtils"       
[3] "/home/fm/Work/sandbox/repos/BiocGenerics"        
[4] "/home/fm/Work/sandbox/repos/DelayedArray"        
[5] "/home/fm/Work/sandbox/repos/GenomicRanges"       
[6] "/home/fm/Work/sandbox/repos/IRanges"             
[7] "/home/fm/Work/sandbox/repos/S4Vectors"           
[8] "/home/fm/Work/sandbox/repos/SummarizedExperiment"
> build_html_mans(packages, "bioc")
Warning: class.AnnotatedDataFrame.Rd:151: missing link ‘updateObject’
Warning: class.eSet.Rd:224: missing link ‘updateObject’
Warning: class.ExpressionSet.Rd:147: missing link ‘updateObject’
Warning: class.ExpressionSet.Rd:174: missing link ‘updateObject’
# I'm skipping the other warnings for missing links
[1] "bioc/manuals/Biobase/man/Biobase.html"                          
[2] "bioc/manuals/BiocBaseUtils/man/BiocBaseUtils.html"              
[3] "bioc/manuals/BiocGenerics/man/BiocGenerics.html"                
[4] "bioc/manuals/DelayedArray/man/DelayedArray.html"                
[5] "bioc/manuals/GenomicRanges/man/GenomicRanges.html"              
[6] "bioc/manuals/IRanges/man/IRanges.html"                          
[7] "bioc/manuals/S4Vectors/man/S4Vectors.html"                      
[8] "bioc/manuals/SummarizedExperiment/man/SummarizedExperiment.html"

For your Loom example, you mentioned it had links to ../manuals/packageName.html (was that a mistake?), which doesn't have an issue because the directory structure is simple, but build_html_mans creates manuals/packageName/man/packageName.html. When I inspect a link to GenomicRanges (from the SummarizedExperiment manual), it uses the default link pattern in the hook, which is just a relative "packageName.html". You'll get a 404 clicking on that link.
Screenshot from 2025-01-30 21-06-59

I'm happy to huddle to better communicate about this. A hook with something like ../../%s/man/%s.html might fix it?

@jwokaty
Copy link
Contributor

jwokaty commented Feb 5, 2025

@LiNk-NY I think it's easiest for me to wrap pkg2HTML in Python to generate the html manuals, so you don't need to take any action on this. I will also have to wrap the other functions in Python to get them into the BBS. I think this can be merged.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Feb 6, 2025

you mentioned it had links to ../manuals/packageName.html (was that a mistake?),

Yes, I meant

minibioc/packages/3.20/bioc/manuals/GenomicRanges/man/GenomicRanges.html

When I inspect a link to GenomicRanges (from the SummarizedExperiment manual), it uses the default link pattern in the hook, which is just a relative "packageName.html". You'll get a 404 clicking on that link.

I may have loaded a "fixed" version of SummarizedExperiment I was working on. But with subsequent testing, I see that that specific link is missing a package anchor (that's why it shows up broken in your testing).

Note that this is relevant to the other aim of this change which is to update packages so that they all provide package anchors and link appropriately to the documentation.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Feb 6, 2025

I think it's easiest for me to wrap pkg2HTML in Python to generate the html manuals

Can you link to the location of this code in the BBS repo?

@jwokaty
Copy link
Contributor

jwokaty commented Feb 6, 2025

I don't understand what you mean by a 'fixed' version of SummerizedExperiment. Are you are saying my example is broken because there wasn't a well-formed package anchor there? I will try testing with lefser. If it's working as is, I will try to use it.

Since I don't know if I can use that function, I tried to generate the HTML in the same place as the PDFs: https://github.com/Bioconductor/BBS/pull/438/files. There's another step that copies them into another place because files get moved a few times during the build. This is just the way that the BBS does it, not necessarily my preference.

I am working on the other part to use your other functions with the BBS.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Feb 6, 2025

I don't understand what you mean by a 'fixed' version of SummerizedExperiment.

A modified version of SummarizedExperiment with appropriate package anchors.

Are you are saying my example is broken because there wasn't a well-formed package anchor there?

Yes, I was also able to reproduce the broken link you were seeing.

Do you know of any set of packages that have correct package anchors to other packages so that I can verify that it's working?

No, we still have to update all the packages with missing package anchors.

@jwokaty
Copy link
Contributor

jwokaty commented Feb 6, 2025

I just tested with lefser, which should have correct package anchors but links to other package from its manual are broken because the function needs a hook that alters the path. You can see from my PR how I am adding a hook and altering the link that's generated. Something like that would fix your function.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Feb 6, 2025

Thanks for catching that. I added the custom hooks argument that we use in the builders. It now works for links that have an appropriate package reference but still does not work for links that have missing package anchors.

@jwokaty
Copy link
Contributor

jwokaty commented Feb 7, 2025

I can confirm that links going out to other packages now work. Thank you.

@jwokaty
Copy link
Contributor

jwokaty commented Mar 11, 2025

It looks good to me. @LiNk-NY @vjcitn Is this okay to merge?

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Mar 11, 2025

Will this code be used in BBS operations? Otherwise, it may not be worth maintaining in biocViews. It can live in the minibioc R package that I am working on for testing.

@jwokaty
Copy link
Contributor

jwokaty commented Mar 11, 2025

I will use build_db_from_source, build_meta_aliases_db, and build_meta_rdxrefs_db, but I am not sure if I will use build_html_mans. There was an separate request that will change the way that we do manuals in the BBS, so I am considering it.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented May 2, 2025

I can modify the PR to remove bulid_html_mans since it won't be used. Let me know if this works. @jwokaty

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented May 24, 2025

@jwokaty
I updated the PR to remove build_html_mans and renamed the file. I also sync'ed with the devel branch

@llrs
Copy link

llrs commented May 28, 2025

AFAIK from to generate the HTML pages will require more work and input from a different R core member. Last time I worked on this, on the useR!2025, we were checking WCAG compliance and there were still some issues left. I think it is okay for now to merge the alias and rdxrefs function and start generating them.

This alone has helped me identify issues on base R and many (CRAN) packages not complying with WRE regarding xrefs (and some other areas of improvements regarding help pages and xrefs).

@jwokaty jwokaty merged commit 463d1d3 into devel May 29, 2025
@jwokaty
Copy link
Contributor

jwokaty commented Jun 12, 2025

@LiNk-NY @llrs The first alias.rds and rdxrefs.rds files for 3.22 software packages are available. I noticed some packages were not included due to error that I need to investigate, but it would be great if someone could look at the assess if there are other issues? Also, are these just needed for software packages (not our annotation or experiment packages as well)?

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Jun 12, 2025

They are probably needed for workflow, annotation, and experiment data packages as well. There will likely be cross-references to documentation pages those type of packages.

@llrs
Copy link

llrs commented Jun 26, 2025

I checked the files and I noticed that some packages (a4, alabaster, alabaster.schemas, assorthead and biodbHmdb) don't have the three expected columns on the rdxrefs.rds file. Is there a reason behind this?

Having a package without xrefs is possible, but inconsistencies on the formatting causes code to fail with error(s) and might indicate other problems in how is that file generated.

bioc <- "https://bioconductor.org/packages/3.22/bioc/src/contrib/Meta/rdxrefs.rds"
raw_xrefs <- readRDS(url(bioc_xrefs_url))
ncols <- vapply(raw_xrefs, NCOL, numeric(1L))
names(ncols)[ncols < 3L]

Other than that I'll explore the files more (for example for links to help pages only available on specific OS). Many thanks for providing them.

One of the goals of having these files is to facilitate having HTML help pages for all packages on these CRAN and Bioconductor repositories with links that resolve correctly. Including the other repositories (BioCsoft, BioCann, BioCexp, BioCworkflows and BioCbooks) will make it easier linking help pages. I hope they might be faster to generate and don't tax the server much.

@jwokaty
Copy link
Contributor

jwokaty commented Jul 1, 2025

Hi @llrs Thank you for looking at the xref and aliases files.

Maybe a4, alabaster, etc. don't have all columns because they don't have manuals? I guess they don't have references as they are primarily libraries.

I have set up annotations and experiments to also produce these xrefs and aliases files, but since we don't produce manuals for books and workflows. Maybe they'll do the same as a4, alabaster, etc?

Note: We do have about 46 packages missing from this data due to the way they generate their manuals and I how I produce the xrefs and aliases files, which I've listed at Bioconductor/BBS#454.

@llrs
Copy link

llrs commented Jul 3, 2025

I don't think this is due to a lack of cross references: other packages have the three columns but no rows. This is what I would expect from books and workflows, but it's good to know that they aren't expected to have manuals.

I missed the issue about those 46 packages. This is more concerning as they include some packages that are widely used. I will check if there are some references to them from CRAN's packages.

But it is a surprise that the aliases file can't be generated. This might indicate a different issue than requiring packages installed to process the documentation. But if I discover anything I'll report back to the linked discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants