Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip tagged cells from .ipynb notebooks passed to the NotebookLite and JupyterLite directives #180

Merged
merged 15 commits into from
Jul 11, 2024

Conversation

agriyakhetarpal
Copy link
Member

@agriyakhetarpal agriyakhetarpal commented Jun 26, 2024

Description

This PR updates the _LiteDirective class to be able to use nbformat, which is already specified as a dependency in pyproject.toml, to be able to read IPyNB files and remove cells which include "strip" set to "true" in their JSON metadata. This way, it is possible to provide cleaner notebooks for cases similar to scipy/scipy#20303 or with PyWavelets/pywt#741, where notebooks passed to the .. notebooklite:: or the .. jupyterlite:: directives can now be preprocessed to remove certain cells before they are copied to the built documentation.

For example, a notebook cell with the Markdown syntax

```{note}
This text should not exist in the notebook in the built documentation
```

can receive metadata like this

{
  "tags": [
    "strip": "true"
  ]
}

and this cell will be removed by nbformat if strip_tagged_cells is set to True in conf.py, therefore disabling the admonition in the deployed documentation.

Changes made

  1. Added a configuration value for the Sphinx extension, strip_tagged_cells which is set to False by default. If set to True in conf.py, this enables preprocessing behaviour
  2. The preprocessing behaviour is in jupyterlite_sphinx/jupyterlite_sphinx.py, where nbformat is used to read the notebook file and operate on the cells of the notebook.
  3. This feature has been documented in docs/configuration.md.

Additional context

  • The TryExamples directive is not compatible with this feature. This is because we need an IPyNB-based notebook file to add metadata for a cell, while TryExamples dynamically generates notebooks at runtime from doctests.

@agriyakhetarpal agriyakhetarpal force-pushed the feat/strip-directives branch from 538735f to ff738bc Compare June 26, 2024 14:08
@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Jun 26, 2024

In commit 23d2cae, I added an example by setting strip_tagged_cells = True and modified my_notebook.ipynb accordingly – the notebook in the hosted documentation contains two cells, while the one in the source has three – which means that one of the cells (the tagged one) was removed.

This can be noticed on the pages in the documentation preview: https://jupyterlite-sphinx--180.org.readthedocs.build/en/180/directives/jupyterlite.html and https://jupyterlite-sphinx--180.org.readthedocs.build/en/180/directives/notebooklite.html.

I shall revert the commit once this PR has been reviewed.

@agriyakhetarpal
Copy link
Member Author

@steppi, I figured out the problem with the repeated copying for the notebook files, and the good thing is that the repeated copying exists only when the notebook is embedded multiple times in the same document (I'm appalled it took me slightly longer than expected to figure it out, haha). This behaviour is apparently different from that of the TryExamples directive in that individual .ipynb notebooks will likely, in real-life documentation scenarios, be referenced just once in a singleton directive somewhere, and therefore we should not need UUIDs at all, far apart the case of the TryExamples directive. Further, it is unrealistic that two notebooks will be placed somewhere in the source for any project's documentation with the exact same filenames. It was only when I tried the experiment for Melissa below that I understood from the logs that my_notebook.ipynb is referenced multiple times in the jupyterlite-sphinx docs, and therefore is copied multiple times (which may be fixed with the env.temp_data solution you suggested, but I would suggest we leave that optimisation for another PR), while the rest of the notebooks are copied just once, as intended.

@melissawm, I tested this with the seventeen or so notebooks from scipy/scipy#21042 like you suggested (I converted them via Jupytext to IPyNB first, using its CLI) and added them under separate JupyterLite directives. The time taken by nbformat to process the notebooks was not noticeable at all!


P.S. Jupytext provides an API as well. Perhaps another worthwhile feature to add to jupyterlite-sphinx would be to allow .. notebooklite:: and ..jupyterlite:: to accept these text-based Markdown notebooks? Jupytext can then convert the notebook from .md at runtime here itself in these changed lines (maybe even by default), and pass/copy it as .ipynb to JupyterLite – this can potentially resolve the problem of having extra targets/phonies in the docs Makefiles implemented in scipy/scipy#20303 and also removes the addition of a custom Sphinx extension connecting to the builder-inited event in PyWavelets/pywt#741. This should also help make the docs directory cleaner at the time of rebuilds, since Markdown-based notebooks will not be required to be paired with their IPyNB counterparts (which may have been git-ignored or excluded from inclusion in Sphinx).

It is to be noted that the JupyterLite interface does have support for Jupytext in the screenshot below, but I am not sure how to make the Markdown notebook runnable – (I think) it is not supported right now: jupyterlite/jupyterlite#731, so this conversion from .md to .ipynb at build time could be a reasonable solution for notebook authors until JupyterLite itself can gain support.

"Jupytext" under the File tab under the standard JupyterLite interface Jupytext, hovered over in the File tab in the JupyterLite interface, local Sphinx docs for the jupyterlite-sphinx project's JupyterLite directive that embeds notebook files

@Carreau
Copy link
Collaborator

Carreau commented Jun 27, 2024

Should this option be a booolean or a list of tag to strip?

cell_strip_tag = ['solutions', 'strip', 'debug']

?

@Carreau Carreau added the enhancement New feature or request label Jun 27, 2024
@agriyakhetarpal
Copy link
Member Author

Should this option be a booolean or a list of tag to strip?

cell_strip_tag = ['solutions', 'strip', 'debug']

?

I have put some thought about the use cases of having JSON metadata for notebooks, e.g., to include interactive widgets like Thebe, or cells where it is helpful to show inputs but hide outputs, etc., so, yes – having a list of tags could be a reasonable option. I imagine this option will be better coupled with strip_tagged_cells instead of replacing it, though – i.e., have a boolean strip_tagged_cells specify whether to strip these cells, and then cell_strip_tag lists which tags to look for when stripping. What do you think, @Carreau?

@Carreau
Copy link
Collaborator

Carreau commented Jul 2, 2024

What do you think, @Carreau?

I don't have much preference. I tend to think that if it's two options it might be more difficult to configure, and if the list is empty it can play the role of not stripping ?

I think I would be +0.1 for just having a single config option. But maybe others want to pitch in.

@agriyakhetarpal
Copy link
Member Author

@steppi, do you have any additional comments to what @Carreau has mentioned? It would be great if you could take a look at this soon!

@steppi
Copy link
Collaborator

steppi commented Jul 10, 2024

@steppi, do you have any additional comments to what @Carreau has mentioned? It would be great if you could take a look at this soon!

Sure, I'll take a look now.

Copy link
Collaborator

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @agriyakhetarpal. I'm in favor of keeping only a boolean config option and not making this more complex than needed. I also have a suggestion about the name of the tag.

jupyterlite_sphinx/jupyterlite_sphinx.py Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
docs/directives/my_notebook.ipynb Outdated Show resolved Hide resolved
docs/directives/my_notebook.ipynb Outdated Show resolved Hide resolved
docs/directives/my_notebook.ipynb Outdated Show resolved Hide resolved
@agriyakhetarpal
Copy link
Member Author

Reverted the changes I made to the notebook in b853462 since the functionality is working as expected.

Copy link
Collaborator

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now. Thanks @agriyakhetarpal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants