[singlehtml] add docname to section anchor to make them unique #13739

gastmaier · 2025-07-20T14:10:55Z

Purpose

Follow up to #13717, inverting the logic, instead of patching the toctree to yield "#id1" instead of "#document-path/to#id1", have the section id to be docname preffixed, solving non-unique ids in singlehtml.
Allows to remove post Sphinx transforms like in here

Top level overview of current behavior

ID collision is resolved per doc (#already-used -> #id1, #already-used -> #id2).
There is no ID collision resolution on singlehtml step.

Approach taken

Based on the LaTeX builder solution.
sphinx/writers/latex.py#hypertarget[withdoc=True] method suffixes docutils id with the docname.
In my implementation I edit ids['0'] directly to not have to overwrite the whole visit_section method, but I understand if requested to not modify the tree and instead overwrite.

On the format #document-test/extra#id1

It is compatible with HTML anchoring, CSS and JavaScript selectors, but require escaping:

#document-test\/extra\#test {color: #f00;}

document.querySelector('#document-test\\/extra\\#test')

Tests

The following tests are relevant:

tests/test_builders/test_build_html_tocdepth.py
test_build_html_numfig.py

References

jayaddison · 2025-08-02T20:34:29Z

Hi @gastmaier - I'm a former semi-regular volunteer contributor here, although I have been less active recently. Thanks for the pull request; and sorry that I did not notice the toctree constructor problem, as you mention in #13717.

I am reading both #13717 and this PR #13739 to try to understand the different approaches and reasons for them.

Also: do you have a test case that we could add under tests/roots that demonstrates the problem? I suppose it would need to include a table of contents of some kind and have a corresponding singlehtml test case.

gastmaier · 2025-08-03T16:53:29Z

Hi @jayaddison maybe extending tests/test_builders/test_build_html_tocdepth.py
to check for duplicated ids?
as is, it already checks if ids are as expected (e.g., the pr changes things in 3 location to keep passing the test)., but not for duplicated ids, so I guess I can add that assertion

jayaddison · 2025-08-03T20:07:05Z

@gastmaier that sounds perfect, yep! (I'd forgotten about those tests)

akhilsmokie7-cloud

Align with purpose

gastmaier · 2025-08-30T17:34:38Z

Hi, @jayaddison and @akhilsmokie7-cloud I rebased and added the test to check for duplicated ids.
This pr relies on changing the ids during the write step.
Initially I didn't really like the approach, but I recently stumble on the fact that the html build also changes the images src path during the write step (original/path/to/image.png -> _images/image_<counter>.png), so I am now more comfortable with this approach.

I added the test to the bottom, checking out fe728f4 will fail at

FAILED tests/test_builders/test_build_html_tocdepth.py::test_unique_ids_singlehtml - AssertionError: assert 16 == 15

as expected, since at f5457f1
I purposely added a section called FooBar to both foo bar, forcing the same id in both pages, which is a problem only for single output.

On, "the html build also changes the images src path during the write step", this is what I am talking about
https://github.com/sphinx-doc/sphinx/blob/master/sphinx/writers/html5.py#L754-L755

CI note:
Failing test

FAILED tests/test_directives/test_directive_only.py::test_sectioning - AssertionError: Section out of place: '1.6.2. Subsection'
assert '1.6.1.1.' == '1.6.2.'

is due to 2e51b787680cefdfe56b3438d809e6476600a47e

Thanks,

jayaddison · 2025-09-02T09:34:21Z

sphinx/builders/singlehtml.py

        for docname, secnums in self.env.toc_secnumbers.items():
            for id, secnum in secnums.items():
-                alias = f'{docname}/{id}'
+                alias = f'{docname}{id}'


What kind of values are possible for the docname and id?

(also: I guess people shouldn't have written hyperlinks or saved bookmarks with the assumption that these aliases are stable? but, even so - if we change the format, I guess we would break those?)

@gastmaier in fact: I'm not sure where these / separator characters appear. What does this code relate to?

For singlehtml and at the assemble toctree step, the href is a tuple of docname and refid.
#document-path/to/#id1 to try to avoid the refid confliction in singlehtml mode problem, which didn't work because it would patch toctree, but the content body still had the non-unique ids.

My pr changes the toctree href format from
#document-path/to/#id1 to #document-path/to#id1 (removes end slash)
and for content ids from
#d1 to #document-path/to#id1 (adds doc prefix to make unique)
the new template is therefore:
#document-{doc}#{id}
direct tuple of docname and refid, without the slash.

These are valid HTML anchors, but do require escaping when manipulating with:
css

#document-test\/extra\#test {color: #f00;}

and javascript

document.querySelector('#document-test\\/extra#test')

singlehtml.zip
here is a singlehtml build with the patch

jayaddison · 2025-09-02T09:37:10Z

sphinx/writers/html5.py

            self.body.append('</dt>')

+    def visit_section(self, node: section) -> None:
+        if self.builder.name == 'singlehtml' and node['ids']:


We don't seem to use many @property methods in the Sphinx writers, but maybe this singlehtml condition is getting to the point where it makes sense (this is the third potential callsite, I think?).

jayaddison · 2025-09-02T09:51:51Z

Maybe pedantic of me to mention, but: running the test code without the fix in place does confirm that the test case fails (duplication of foobar-b1 alias).

jayaddison · 2025-09-02T09:53:34Z

Maybe pedantic of me to mention, but: running the test code without the fix in place does confirm that the test case fails (duplication of foobar-b1 alias).

(I attempted that to reassure myself and to learn slightly more about how the fix works)

gastmaier · 2025-09-02T10:32:44Z

Drafting again, I spotted more links using the non-doc-prefixed anchor in the body.
I spotted: explicit refs

.. _explicit-ref:

are not being prefixed. but their links to it are correct (document-path/to#explicit-ref)

I will give yet another try, but this time transversing the pickled to patch all ids early on, instead of patching at the nodes visit.

Sample of new new approach:
doc.tar.gz

gastmaier · 2025-09-03T08:42:58Z

Applied the ruthless traverse to patch all (ref)?ids early on, instead of patching at the nodes visit.

This approach avoids mass overwrite of every docutils method under the sun, e.g. the starttag method for the sneaky explicit ref <span id="<id>">.

The procedure is to patch doctree (prefix_ids_with_docname) after the assemble_toctree , and before the other singlehtml patches (assemble_toc_secnumbers and assemble_toc_fignumbers), that also have been adjusted to match the existing document-<doc>#<id> format instead the previous loose <doc>/<id> format.

Since the call stack is a little hidden, here is a summary

@builders/singlehtml
write_documents
  - assemble_doctree:
    - inline_all_toctrees
    - resolve_references
      -  apply_post_transforms
    - prefix_ids_with_docname (new)
  - assemble_toc_secnumbers
  - assemble_toc_fignumbers

jayaddison · 2025-09-03T09:32:42Z

sphinx/builders/singlehtml.py

+            if 'refid' in node or 'ids' in node:
+                docname = env.path2doc(doc['source'])
+            if 'refid' in node:
+                node['refid'] = 'document-' + docname + '#' + node['refid']
+            if 'ids' in node:
+                node['ids'] = ['document-' + docname + '#' + id for id in node['ids']]


I'll plan to do this within the next 24h or so, but I'll ask in case it is something you could do quickly: could you print out two columns of text with the before and after values for these node attributes when building a non-trivial project (easiest/safest choice: Sphinx itself)?

e.g.

refids before after [sample/#foo] [document-sample#foo] node_id sample/#foo document-sample#foo

The reason I ask: I'd like to inspect the places where the results differ, and in particular how the code changes achieve uniqueness of the results.

(I'm also wondering whether docutils -- which produces the node objects, if I understand correctly - could help us and allow us to fix this in a more central location; and I hope that viewing the comparison columns may also help to understand whether that is realistic or whether this is some Sphinx-specific quirk)

(I'm also wondering whether docutils -- which produces the node objects, if I understand correctly - could help us and allow us to fix this in a more central location; and I hope that viewing the comparison columns may also help to understand whether that is realistic or whether this is some Sphinx-specific quirk)

Nope, scratch that - I think that docutils is unaware of the notion of docnames, so whatever is going on here must, I think, be part of Sphinx itself.

So docutils provides the solved tree to the builder, with each doc being a document.
Sphinx guarantees the ids are unique per doc, the filesystem guarantees the docname is unique (you cannot have two identical paths)
But the builder singlehtml flattens all into the root doc index, loosing the information of the docname, causing non-unique ids after flatting it.
This fix recovers the docname and patches into the id itself.

The sphinx documentation itself, attached below, has conflicts, there are many duplicated id1.

singlehtml.zip

The table requested (attached because it is too long):

ids.md

Thanks very much @gastmaier - that makes the problem and fix nice and clear.

I'm reading the comparison file at the moment - in particular I'm interested to find whether any of the before elements included a / delimiter -- I haven't found any so far. If there are none, then that would completely resolve my concern about breaking any existing hyperlinks containing that character.

Do you have any thoughts about whether we should always include the complete document path prefix? Or whether, for example, it could be omitted for unambiguous/unique IDs?

I would patch all at the moment, it makes sense to me to store the lost docname information in the id itself, and it is clearer to debug.

For the toctree, before the pr, it would already generate in the format document-<doc>#<id>, so this would need to be assessed as well. That's what #13717 tried to fix, only to uncover the collision issue.

And there are so many visit_* elements that needs to be patched to handle every corner case, that uniforming into a single format early on (after SphinxPostTransform, before other singlehtml patches) seems to be the only reliable approach.

The latex builder does patch at the visit_* elements with the sphinx/writers/latex.py#hypertarget[withdoc=True] method, but I don't see that working with html since it is straight up more convoluted since each visit would require some if builder.name is 'singlehtml'.

Thanks @gastmaier. I agree. Also: in my opinion we should highlight this as a hyperlink-breaking change -- for singlehtml builds -- in the changelog notes.

One remaining concern I have: the document- prefix on every target seems fairly verbose. I am wondering whether we could restore the path delimiter (/) -- and maybe even add a root prefix (/{docname}/#{refid}) -- to reduce the risk of duplication against other anchors declared by projects.

I can give a try, but I need to know exactly the format we want to achieve, as I understood it would be , in the html:

href="#/path/to/doc/#anchor_name"

?

That's the same format that I had in mind too, yep 👍

gastmaier · 2025-09-06T16:19:54Z

Hi @jayaddison applied the new /docname/#id format.
Attached is sphinx doc as singlehtml
singlehtml.zip

CHANGES.rst

jayaddison · 2025-09-08T06:04:21Z

sphinx/builders/singlehtml.py

+    def prefix_ids_with_docname(self, tree: nodes.document) -> None:
+        # Append docname to refids and ids using format document-<docname>#<id>.
+        # Compensates for loss of the pathname section of the href, that
+        # ensures uniqueness in the html builder.


What do you think about adjusting the name of this method to ensure_fully_qualified_refids, or similar?

Note: when I suggest the terminology fully-qualified there, I'm borrowing it as a name from DNS: https://en.wikipedia.org/wiki/Fully_qualified_domain_name

Will do, I see in the codebase similar ensure_ methods that alter the doc without return.
I was going to suggest get_fully_qualified_refids, but it is not a get method so it would be misleading.

gastmaier · 2025-09-08T07:26:36Z

Done minor requested changes

jayaddison · 2025-09-08T08:41:38Z

sphinx/builders/singlehtml.py

+    def ensure_fully_qualified_refids(self, tree: nodes.document) -> None:
+        # Append docname to refids and ids using format document-<docname>#<id>.
+        # Compensates for loss of the pathname section of the href, that
+        # ensures uniqueness in the html builder.


Thanks for the method renaming! Please also replace the comment with a docstring.

sphinx/builders/singlehtml.py

jayaddison · 2025-09-08T19:35:08Z

sphinx/builders/singlehtml.py

+            doc = node.document
+            if doc is None:
+                continue
+            env = doc.settings.env
+            if 'refid' in node or 'ids' in node:
+                docname = env.path2doc(doc['source'])


Is there any situation where node.document can be empty? If not, then we can simplify:

Suggested change

doc = node.document

if doc is None:

continue

env = doc.settings.env

if 'refid' in node or 'ids' in node:

docname = env.path2doc(doc['source'])

if 'refid' in node or 'ids' in node:

docname = self.env.path2doc(node.document['source'])

mypy complains

sphinx/builders/singlehtml.py:99:36: error: Value of type "document | None" is not indexable [index]

I tried the NodeMatcher.findall but got the same result.
With cast

doc = cast(nodes.document, node.document)

I get

sphinx/builders/singlehtml.py:97:19: error: Redundant cast to "document" [redundant-cast]

assert node.document is not None works

Nice, thank you 👍

sphinx/builders/singlehtml.py

jayaddison

Looks great to me - thank you, @gastmaier!

Note that I am not a core contributor here, so my review is not binding and I cannot merge the PR - even so, it looks good to me.

cc @sphinx-doc/developers (does this GitHub team still exist?)

jayaddison · 2025-09-09T08:08:59Z

Ah, I hadn't noticed the failing tests, though - we'll want to fix those too.

gastmaier · 2025-09-09T11:10:35Z

@jayaddison well at least the test added is working.
Thank you for reviewing this, I believe now it is finally finished

gastmaier · 2025-09-23T19:58:10Z

Hi @AA-Turner can you revisit this one? 😄
Thank you,

edit: force pushed to resolve merge conflict in CHANGES.rst

To assert unique ids in singlehtml builder. Signed-off-by: Jorge Marques <[email protected]>

Since the singlehtml aggregates all doc files into a single html page during the write step, and the ids must be unique for proper link anchoring, add test that collects all ids in the page and checks if all ids are unique, by asserting the length of the list against it as a set.

Use doc path to make ids unique with format ``/docname/#id``. Compensates for the loss of the pathname in the href. This will break existing hyperlinks to `singlehtml` HTML documents since it alters the format

Format as ``/docname/#id`` to match other parts.

gastmaier force-pushed the toctree-singlehtml2 branch 3 times, most recently from e6b65fb to 5117057 Compare July 20, 2025 14:23

gastmaier marked this pull request as ready for review July 21, 2025 07:39

AA-Turner added the sprint For work completed at a conference or similar event. label Jul 21, 2025

jayaddison mentioned this pull request Jul 27, 2025

[singlehtml] toctree no filename with anchor #13717

Closed

akhilsmokie7-cloud reviewed Aug 4, 2025

View reviewed changes

gastmaier force-pushed the toctree-singlehtml2 branch 2 times, most recently from c50ba56 to 910de47 Compare August 30, 2025 17:32

gastmaier force-pushed the toctree-singlehtml2 branch from 910de47 to 3a92a34 Compare September 2, 2025 08:56

jayaddison reviewed Sep 2, 2025

View reviewed changes

gastmaier marked this pull request as draft September 2, 2025 10:32

gastmaier force-pushed the toctree-singlehtml2 branch 2 times, most recently from 9edcc87 to 82fae9f Compare September 3, 2025 08:40

gastmaier force-pushed the toctree-singlehtml2 branch from 82fae9f to bfa9f06 Compare September 3, 2025 09:16

gastmaier marked this pull request as ready for review September 3, 2025 09:16

jayaddison reviewed Sep 3, 2025

View reviewed changes

gastmaier force-pushed the toctree-singlehtml2 branch 3 times, most recently from e6a707b to 0f7ce18 Compare September 6, 2025 16:19

jayaddison reviewed Sep 8, 2025

View reviewed changes

CHANGES.rst Outdated Show resolved Hide resolved

jayaddison reviewed Sep 8, 2025

View reviewed changes

gastmaier force-pushed the toctree-singlehtml2 branch 2 times, most recently from 39b0657 to 88ac2dd Compare September 8, 2025 07:25

jayaddison reviewed Sep 8, 2025

View reviewed changes

sphinx/builders/singlehtml.py Outdated Show resolved Hide resolved

gastmaier force-pushed the toctree-singlehtml2 branch from 88ac2dd to abdd3de Compare September 8, 2025 09:26

jayaddison reviewed Sep 8, 2025

View reviewed changes

sphinx/builders/singlehtml.py Outdated Show resolved Hide resolved

gastmaier force-pushed the toctree-singlehtml2 branch 2 times, most recently from fee3c39 to c455f96 Compare September 9, 2025 08:00

jayaddison approved these changes Sep 9, 2025

View reviewed changes

gastmaier force-pushed the toctree-singlehtml2 branch from c455f96 to 3de2154 Compare September 9, 2025 11:03

jayaddison approved these changes Sep 9, 2025

View reviewed changes

gastmaier added 5 commits October 15, 2025 17:03

Add section with same title for test-tocdepth/[foo/bar]

ace8fce

To assert unique ids in singlehtml builder. Signed-off-by: Jorge Marques <[email protected]>

Update AUTHORS.rst and CHANGES.rst

c9c7b90

[singlehtml] Append docname to refid and ids

56ac792

Use doc path to make ids unique with format ``/docname/#id``. Compensates for the loss of the pathname in the href. This will break existing hyperlinks to `singlehtml` HTML documents since it alters the format

[singlehtml] Reformat fignum and secnum tuple

1872ce1

Format as ``/docname/#id`` to match other parts.

gastmaier force-pushed the toctree-singlehtml2 branch from 3de2154 to 1872ce1 Compare October 15, 2025 15:05

Uh oh!

[singlehtml] add docname to section anchor to make them unique #13739

Are you sure you want to change the base?

[singlehtml] add docname to section anchor to make them unique #13739

Uh oh!

Conversation

gastmaier commented Jul 20, 2025

Purpose

Top level overview of current behavior

Approach taken

On the format #document-test/extra#id1

Tests

References

Uh oh!

jayaddison commented Aug 2, 2025

Uh oh!

gastmaier commented Aug 3, 2025

Uh oh!

jayaddison commented Aug 3, 2025

Uh oh!

akhilsmokie7-cloud left a comment

Choose a reason for hiding this comment

Uh oh!

gastmaier commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayaddison commented Sep 2, 2025

Uh oh!

jayaddison commented Sep 2, 2025

Uh oh!

gastmaier commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gastmaier commented Sep 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gastmaier Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gastmaier commented Sep 6, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gastmaier commented Sep 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gastmaier commented Aug 30, 2025 •

edited

Loading

gastmaier commented Sep 2, 2025 •

edited

Loading

gastmaier Sep 3, 2025 •

edited

Loading

gastmaier Sep 9, 2025 •

edited

Loading

gastmaier commented Sep 23, 2025 •

edited

Loading