Skip to content

fix: find paragraphs in elements with images in docx #1486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Manuel030
Copy link
Contributor

Some text is not found when using the MsWordDocumentBackend. An example docx file where this happens is attached: paragraph_in_image.docx

The pragmatic solution is to attempt to add text elements even when a drawing expression is found.

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link

mergify bot commented Apr 28, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@PeterStaar-IBM
Copy link
Contributor

@Manuel030 Thank you for the PR! Could you add this document as a test?

Signed-off-by: Manuel030 <[email protected]>
@Manuel030
Copy link
Contributor Author

@PeterStaar-IBM Sure

item-22 at level 4: paragraph: Here are some interesting things a respectful duck could eat:
item-23 at level 4: table with [4x3]
item-24 at level 4: paragraph:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this solution generates empty text paragraphs, perhaps every time we encounter drawing.
@Manuel030, please add check for an empty text and ignore adding such paragraphs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the comment on line 312:

 # for now retain empty paragraphs for backwards compatibility:

Hence, I would leave it as is to not introduce another special case in def _get_paragraph_elements(self, paragraph: Paragraph)

@cau-git
Copy link
Contributor

cau-git commented May 23, 2025

@Manuel030 @maxmnemonic There is apparently a newer PR with the same goal here: #1610 which has the proper condition to not produce empty text paragraphs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants