Skip to content

Adding support for "role" attributes for the DocBook reader (second try!) #10932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

yanntrividic
Copy link

This PR is the direct result of what has been discussed on #10665. This proposition is basically a cleaner version based on the previous contribution attempt.

More details on what this PR cover can be found in this comment.

This commits adds a getRoleAttr function to get
the value from the role attribute of a DocBook
element.
This value is then added to the returned Pandoc
object by using addPandocAttributes.
Sections have to be considered in a different way
than regular blocks and inlines.
In this case, we wrap the content in a Div element
with a `section` class and a `level` attribute in
addition to the `role` attribute.
See this comment:
jgm@d097c64
Elements parsed with withOptionalTitle do not
automatically get the role attributes transfered
to them. This covers this problem.
@yanntrividic
Copy link
Author

Hello! A month has passed, and in the meantime I have been working on my IDML reader... It is now giving pretty nice results :), and is now easy to install and test out: https://outdesign.deborderbollore.fr/en/md/2_installation.html.

If there is anything I can do at this point to help with the PR, don't hesitate. I'd love to see that be integrated in the main branch, as it would definitely help in making IDML files really parsable by Pandoc!

(ping @lifeunleaded who might be interested as well.)

@jgm
Copy link
Owner

jgm commented Jul 25, 2025

Hi! As you can see, the Windows build has a warning that should be fixed. (Actually all the others should have given this too, but the CI was misconfigured; this has now been fixed.)

Fix that up, rebase, and hopefully we can get it merged soon.

@jgm
Copy link
Owner

jgm commented Jul 25, 2025

Sorry - this warning is not from your code, it comes from a bad commit in main that you rebased on top of. So if you just rebase your changes on top of current HEAD, everything should pass.

Comment on lines +1122 to +1125
return $ case attrValue "role" e of
"" -> content
_ -> divWith ("", ["section"],
("level", T.pack $ show n') : attrs) content
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be mixing the use of section Divs and bare headings in the same document the way you do here. Why not add the role attribute to the Header? When the resulting AST is passed through makeSections, it will become a section div.

Copy link
Author

@yanntrividic yanntrividic Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, thanks for taking the time to have a look at it again. I proposed this modification following what I understood from your recommendation in #10665 (comment).

The problem that led to this modification was that the role attributes were applied recursively to all child sections, because of the way addPandocAttributes is designed. From my understanding, we arrived to this "acceptable" solution to avoid this recursion.

Should we figure out something else then?

yanntrividic and others added 21 commits July 25, 2025 10:05
Sections have to be considered in a different way
than regular blocks and inlines.
In this case, we wrap the content in a Div element
with a `section` class and a `level` attribute in
addition to the `role` attribute.
Elements parsed with withOptionalTitle do not
automatically get the role attributes transfered
to them. This covers this problem.
Functions that expect UTF-8-encoded filenames should make it easier to
write platform-independent scripts, as the encoding of the actual
filename depends on the system.

Additionally, this also adds a generalized method to run commands, and
functions to retrieve XDG directory names.

The new functions are `command`, `copy`, `read_file`, `remove`,
`rename`, `times`, `write_file`, `xdg`.
for every PDF engine, not just LaTeX/ConTeXt.

This is part of the fix for jgm#10911.
Export `copyrightMessage` from the unexported module
Text.Pandoc.App.CommandLineOptions
and reexport from Text.Pandoc.App [API change].

This avoids the need for a duplicated version in pandoc-cli, which can
now depend on the library's exported version.
Text.Pandoc.App.CommandLineOptions and pandoc-cli/src/pandoc.hs
had similar code for generating version information.

To avoid duplication, we now export `versionInfo` from
Text.Pandoc.App [API change]. (The function is reexported from the
non-public module Text.Pandoc.App.CommandLineOptions.)  This function
has three parameters that can be filled in when it is called by
pandoc-cli.

This change will make it simpler to revise version information.
This function performs a normalization of Pandoc documents. E.g.,
multiple successive spaces are collapsed, and tables are normalized such
that all rows and columns contain the same number of cells.

Closes: jgm#10356
from the latest chicago-author-date.csl. (Note that this goes
from the 17th to the 18th edition.)

Update tests.
When we updated to the latest chicago-author-date.csl, this test
no longer tested what it was supposed to; so we use a different csl.
It was meant to test subsequent author substitution, but the
new chicago-author-date doesn't do this. So we use a different CSL.
jgm and others added 26 commits July 25, 2025 11:34
[API change] New Avif constructor on ImageType.

Closes jgm#10979.
We were only getting the return status for the tests, apparently,
from `cabal test`. So now we run `cabal build` separately.
The functions allows to check the existence of file-system objects.
Closes jgm#10983 by allowing `nocase` spans to be used to suppress
capitalization of initial word in a footnote.
Previously we set `--ghc-options` in Makefile and CI; but this
overrides the ghc-options set in the pandoc.cabal file.

Better to add options one-by-one using `--ghc-option`.

We no longer use GHC_OPTIONS and just put these extra options
in CABAL_OPTIONS.
For now I want to avoid having to put in lots of CPP.
This implements the changes suggested in jgm#9956, with the exception of
the filecolor/urlcolor one. These would require adding some regex to
guess the link types. This is theoretically possible to do, but it
wasn't clear to me that this is a good thing to put in a default
template. Happy to adjust if you have thoughts on this.

Closes jgm#9956.

Some things to note:

I'm converting colors by passing them as content, as I was seeing pandoc
escape # if that was included.

I set the default fonts for math and code ("raw") to fonts that are
bundled with Typst. These need not be those fonts if there are more
familiar pandoc preferences.
This solves the problem of unwanted capitalization of names
at the beginning of citations in footnotes.

Closes jgm#10983.
These are the development versions of the LaTeX binaries; installable,
e.g., with `tlmgr install latex-base-dev`.

Closes: jgm#10991
Each supported engine is now printed on a line of its own.
Sections have to be considered in a different way
than regular blocks and inlines.
In this case, we wrap the content in a Div element
with a `section` class and a `level` attribute in
addition to the `role` attribute.
Elements parsed with withOptionalTitle do not
automatically get the role attributes transfered
to them. This covers this problem.
@yanntrividic
Copy link
Author

... I am so bad at rebasing. I am sorry. The only thing I tried to do was to change the commit-msg-length, which apparently did not work, and instead it added all those commits that basically are all the commits that were the changes on top of current HEAD since I made my PR. I can make yet another PR if this is cleaner, sorry for this.

@jgm
Copy link
Owner

jgm commented Jul 25, 2025

You need to use git rebase, not git merge.

Try git rebase --interactive origin/main. This will allow you to squash and reword commits and it should get rid of merge ugliness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants