Adding support for "role" attributes for the DocBook reader (second try!) #10932

yanntrividic · 2025-06-23T18:10:25Z

This PR is the direct result of what has been discussed on #10665. This proposition is basically a cleaner version based on the previous contribution attempt.

More details on what this PR cover can be found in this comment.

This commits adds a getRoleAttr function to get the value from the role attribute of a DocBook element. This value is then added to the returned Pandoc object by using addPandocAttributes.

Sections have to be considered in a different way than regular blocks and inlines. In this case, we wrap the content in a Div element with a `section` class and a `level` attribute in addition to the `role` attribute. See this comment: jgm@d097c64

Elements parsed with withOptionalTitle do not automatically get the role attributes transfered to them. This covers this problem.

yanntrividic · 2025-07-23T15:07:30Z

Hello! A month has passed, and in the meantime I have been working on my IDML reader... It is now giving pretty nice results :), and is now easy to install and test out: https://outdesign.deborderbollore.fr/en/md/2_installation.html.

If there is anything I can do at this point to help with the PR, don't hesitate. I'd love to see that be integrated in the main branch, as it would definitely help in making IDML files really parsable by Pandoc!

(ping @lifeunleaded who might be interested as well.)

jgm · 2025-07-25T02:00:50Z

Hi! As you can see, the Windows build has a warning that should be fixed. (Actually all the others should have given this too, but the CI was misconfigured; this has now been fixed.)

Fix that up, rebase, and hopefully we can get it merged soon.

jgm · 2025-07-25T03:19:46Z

Sorry - this warning is not from your code, it comes from a bad commit in main that you rebased on top of. So if you just rebase your changes on top of current HEAD, everything should pass.

jgm · 2025-07-25T04:53:47Z

src/Text/Pandoc/Readers/DocBook.hs

+           return $ case attrValue "role" e of
+                      "" -> content
+                      _  -> divWith ("", ["section"],
+                             ("level", T.pack $ show n') : attrs) content


We shouldn't be mixing the use of section Divs and bare headings in the same document the way you do here. Why not add the role attribute to the Header? When the resulting AST is passed through makeSections, it will become a section div.

Hello, thanks for taking the time to have a look at it again. I proposed this modification following what I understood from your recommendation in #10665 (comment).

The problem that led to this modification was that the role attributes were applied recursively to all child sections, because of the way addPandocAttributes is designed. From my understanding, we arrived to this "acceptable" solution to avoid this recursion.

Should we figure out something else then?

Sections have to be considered in a different way than regular blocks and inlines. In this case, we wrap the content in a Div element with a `section` class and a `level` attribute in addition to the `role` attribute.

Elements parsed with withOptionalTitle do not automatically get the role attributes transfered to them. This covers this problem.

Functions that expect UTF-8-encoded filenames should make it easier to write platform-independent scripts, as the encoding of the actual filename depends on the system. Additionally, this also adds a generalized method to run commands, and functions to retrieve XDG directory names. The new functions are `command`, `copy`, `read_file`, `remove`, `rename`, `times`, `write_file`, `xdg`.

for every PDF engine, not just LaTeX/ConTeXt. This is part of the fix for jgm#10911.

…gm#10956)

Closes jgm#10961.

Per GNU guidelines: https://www.gnu.org/prep/standards/html_node/_002d_002dversion.html

Export `copyrightMessage` from the unexported module Text.Pandoc.App.CommandLineOptions and reexport from Text.Pandoc.App [API change]. This avoids the need for a duplicated version in pandoc-cli, which can now depend on the library's exported version.

This reverts commit 2cf9b55.

Text.Pandoc.App.CommandLineOptions and pandoc-cli/src/pandoc.hs had similar code for generating version information. To avoid duplication, we now export `versionInfo` from Text.Pandoc.App [API change]. (The function is reexported from the non-public module Text.Pandoc.App.CommandLineOptions.) This function has three parameters that can be filled in when it is called by pandoc-cli. This change will make it simpler to revise version information.

Closes jgm#10965.

This function performs a normalization of Pandoc documents. E.g., multiple successive spaces are collapsed, and tables are normalized such that all rows and columns contain the same number of cells. Closes: jgm#10356

from the latest chicago-author-date.csl. (Note that this goes from the 17th to the 18th edition.) Update tests.

When we updated to the latest chicago-author-date.csl, this test no longer tested what it was supposed to; so we use a different csl.

It was meant to test subsequent author substitution, but the new chicago-author-date doesn't do this. So we use a different CSL.

[API change] New Avif constructor on ImageType. Closes jgm#10979.

We were only getting the return status for the tests, apparently, from `cabal test`. So now we run `cabal build` separately.

The functions allows to check the existence of file-system objects.

Closes jgm#10983 by allowing `nocase` spans to be used to suppress capitalization of initial word in a footnote.

Previously we set `--ghc-options` in Makefile and CI; but this overrides the ghc-options set in the pandoc.cabal file. Better to add options one-by-one using `--ghc-option`. We no longer use GHC_OPTIONS and just put these extra options in CABAL_OPTIONS.

For now I want to avoid having to put in lots of CPP.

This implements the changes suggested in jgm#9956, with the exception of the filecolor/urlcolor one. These would require adding some regex to guess the link types. This is theoretically possible to do, but it wasn't clear to me that this is a good thing to put in a default template. Happy to adjust if you have thoughts on this. Closes jgm#9956. Some things to note: I'm converting colors by passing them as content, as I was seeing pandoc escape # if that was included. I set the default fonts for math and code ("raw") to fonts that are bundled with Typst. These need not be those fonts if there are more familiar pandoc preferences.

…ons (jgm#10990)

This solves the problem of unwanted capitalization of names at the beginning of citations in footnotes. Closes jgm#10983.

These are the development versions of the LaTeX binaries; installable, e.g., with `tlmgr install latex-base-dev`. Closes: jgm#10991

Each supported engine is now printed on a line of its own.

Sections have to be considered in a different way than regular blocks and inlines. In this case, we wrap the content in a Div element with a `section` class and a `level` attribute in addition to the `role` attribute.

Elements parsed with withOptionalTitle do not automatically get the role attributes transfered to them. This covers this problem.

yanntrividic · 2025-07-25T09:44:24Z

... I am so bad at rebasing. I am sorry. The only thing I tried to do was to change the commit-msg-length, which apparently did not work, and instead it added all those commits that basically are all the commits that were the changes on top of current HEAD since I made my PR. I can make yet another PR if this is cleaner, sorry for this.

jgm · 2025-07-25T17:24:57Z

You need to use git rebase, not git merge.

Try git rebase --interactive origin/main. This will allow you to squash and reword commits and it should get rid of merge ugliness.

yanntrividic added 5 commits June 23, 2025 18:31

Modifying return value for parseBlock, parseInline

e7f187b

This commits adds a getRoleAttr function to get the value from the role attribute of a DocBook element. This value is then added to the returned Pandoc object by using addPandocAttributes.

Add roles to els parsed with withOptionalTitle

b626d72

Elements parsed with withOptionalTitle do not automatically get the role attributes transfered to them. This covers this problem.

Units tests for this PR and new role attributes

4b1bc98

Merge branch 'jgm:main' into main

6b2d679

jgm reviewed Jul 25, 2025

View reviewed changes

yanntrividic and others added 21 commits July 25, 2025 10:05

Merge branch 'jgm:main' into main

d932d30

Adding roles to headers for sections with roles

08dbdfa

Sections have to be considered in a different way than regular blocks and inlines. In this case, we wrap the content in a Div element with a `section` class and a `level` attribute in addition to the `role` attribute.

Add roles to els parsed with withOptionalTitle

5af69cb

Elements parsed with withOptionalTitle do not automatically get the role attributes transfered to them. This covers this problem.

Units tests for this PR and new role attributes

71edd85

pandoc-lua-engine: Allow hslua-2.4.0 in the tests

58f471c

CI: use windows-2022. windows-2019 is no longer provided.

198eaa0

PDF: make images from MediaBag available in tmp dir...

dbcdb0c

for every PDF engine, not just LaTeX/ConTeXt. This is part of the fix for jgm#10911.

PDF: Use utf8ToText for LaTeX log messages.

71743f9

doc/lua-filters.md: Add example on using pandoc.Table constructor. (j…

0d68b91

…gm#10956)

Update --version copyright dates.

10307e3

Closes jgm#10961.

Use hardcoded string "pandoc" for program name in --version.

a493340

Per GNU guidelines: https://www.gnu.org/prep/standards/html_node/_002d_002dversion.html

Revert "Export copyrightMessage from Text.Pandoc.App module."

8f66ada

This reverts commit 2cf9b55.

Typst writer: set lang attribute in Divs.

f0fc5fd

Closes jgm#10965.

Lua: add normalize function to *Pandoc* objects

965c74a

This function performs a normalization of Pandoc documents. E.g., multiple successive spaces are collapsed, and tables are normalized such that all rows and columns contain the same number of cells. Closes: jgm#10356

Use latest dev citeproc and update the default CSL...

356a507

from the latest chicago-author-date.csl. (Note that this goes from the 17th to the 18th edition.) Update tests.

Fix citeproc-87 test.

2de4cda

When we updated to the latest chicago-author-date.csl, this test no longer tested what it was supposed to; so we use a different csl.

Fix pandoc-citeproc-64 test.

eb2f3d4

It was meant to test subsequent author substitution, but the new chicago-author-date doesn't do this. So we use a different CSL.

Use latest dev citeproc.

1d8218e

jgm and others added 26 commits July 25, 2025 11:34

T.P.ImageSize: support avif images.

7cd0289

[API change] New Avif constructor on ImageType. Closes jgm#10979.

Fix incomplete pattern matches from new ImageType constructor.

23d480d

Fix CI so that -Wall -Werror works again!

a52d8cb

We were only getting the return status for the tests, apparently, from `cabal test`. So now we run `cabal build` separately.

Makefile: add -Wall to ghc options.

3d1be4e

Lua: add function pandoc.path.exists.

8dfb2fa

The functions allows to check the existence of file-system objects.

Use latest dev citeproc.

addfa97

Closes jgm#10983 by allowing `nocase` spans to be used to suppress capitalization of initial word in a footnote.

Ensure that all modules have explicit export lists.

6e46b62

CI: don't warn on unused imports in ghc 9.10+.

5f56d62

For now I want to avoid having to put in lots of CPP.

CI: another stab at preventing ghc 9.10, 9.12 from erroring.

538bb04

Fix CI again.

16b6ec0

Use latest dev citeproc.

f9ce3cd

Use dev texmath.

9b7287e

Fix stack.yaml.

8ecb2a8

Org reader: Recognize "fast access" characters in TODO state definiti…

c365732

…ons (jgm#10990)

DocBook reader: Add rowspan support. (jgm#10981)

53c3f88

Revert a test case that changed due to a reverted citeproc change.

6070379

Use latest dev citeproc.

b11afcf

This solves the problem of unwanted capitalization of names at the beginning of citations in footnotes. Closes jgm#10983.

T.P.PDF: clean up makePDF

7a30647

PDF: allow pdflatex-dev and lualatex-dev as PDF engines

6f61b8e

These are the development versions of the LaTeX binaries; installable, e.g., with `tlmgr install latex-base-dev`. Closes: jgm#10991

PDF: Improve error readability when pdf-engine is not supported.

e7e1725

Each supported engine is now printed on a line of its own.

Merge branch 'main' of https://github.com/yanntrividic/pandoc

a517533

Adding roles to headers for sections w/ roles

4c01975

Sections have to be considered in a different way than regular blocks and inlines. In this case, we wrap the content in a Div element with a `section` class and a `level` attribute in addition to the `role` attribute.

Add roles to els parsed with withOptionalTitle

eb8a928

Elements parsed with withOptionalTitle do not automatically get the role attributes transfered to them. This covers this problem.

Merge branch 'main' of https://github.com/yanntrividic/pandoc

cc96c03

yanntrividic added 2 commits July 25, 2025 12:19

Units tests for this PR and new role attributes

ce65132

Merge branch 'main' of https://github.com/yanntrividic/pandoc

7d6d428

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adding support for "role" attributes for the DocBook reader (second try!) #10932

Adding support for "role" attributes for the DocBook reader (second try!) #10932

Uh oh!

yanntrividic commented Jun 23, 2025

Uh oh!

yanntrividic commented Jul 23, 2025

Uh oh!

jgm commented Jul 25, 2025

Uh oh!

jgm commented Jul 25, 2025

Uh oh!

jgm Jul 25, 2025

Uh oh!

yanntrividic Jul 25, 2025 •

edited

Loading

Uh oh!

yanntrividic commented Jul 25, 2025

Uh oh!

jgm commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

Adding support for "role" attributes for the DocBook reader (second try!) #10932

Are you sure you want to change the base?

Adding support for "role" attributes for the DocBook reader (second try!) #10932

Uh oh!

Conversation

yanntrividic commented Jun 23, 2025

Uh oh!

yanntrividic commented Jul 23, 2025

Uh oh!

jgm commented Jul 25, 2025

Uh oh!

jgm commented Jul 25, 2025

Uh oh!

jgm Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

yanntrividic Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanntrividic commented Jul 25, 2025

Uh oh!

jgm commented Jul 25, 2025

Uh oh!

Uh oh!

yanntrividic Jul 25, 2025 •

edited

Loading