Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: parse html with omitted body tag #818

Merged
merged 2 commits into from
Jan 27, 2025

Conversation

ceberam
Copy link
Contributor

@ceberam ceberam commented Jan 27, 2025

  • Parse HTML files without body tag, since it is optional in HTML5 specification
  • Add tests to ensure docling converts HTML documents without the body tag

Resolves #810

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Parse HTML files without 'body' tag, since it is optional in HTML5 specification.

Signed-off-by: Cesar Berrospi Ramis <[email protected]>
@ceberam ceberam added bug Something isn't working html issue related to html backend labels Jan 27, 2025
@ceberam ceberam self-assigned this Jan 27, 2025
Copy link

mergify bot commented Jan 27, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@PeterStaar-IBM PeterStaar-IBM merged commit a112d7a into main Jan 27, 2025
9 checks passed
@PeterStaar-IBM PeterStaar-IBM deleted the fix/parse-html-without-optional-tags branch January 27, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working html issue related to html backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error parsing html file where there's no <body> tag
3 participants