Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ContentLayer attribute to designate items to body or furniture #148

Merged
merged 10 commits into from
Feb 10, 2025

Conversation

cau-git
Copy link
Contributor

@cau-git cau-git commented Feb 3, 2025

No description provided.

Copy link

mergify bot commented Feb 3, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

PeterStaar-IBM
PeterStaar-IBM previously approved these changes Feb 5, 2025
Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!

docling_core/types/doc/document.py Outdated Show resolved Hide resolved
# Since version 1.1.0, all NodeItems carry content_layer property.
# We must assign previous page_header and page_footer instances to furniture.
# Note: model_validators which check on the version must use "before".
if "version" in data and data["version"] == "1.0.0":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have already logic in docling-core about checking version numbers: https://github.com/DS4SD/docling-core/blob/main/docling_core/types/doc/document.py#L2947

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why do we need that logic? This is not about determining if the version is compatible, it is specifically checking if the version is 1.0.0 (the only version in existence prior to the current version).

Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have to add PAGE_HEADER and PAGE_FOOTER to DEFAULT_EXPORT_LABELS

docling_core/types/doc/document.py Outdated Show resolved Hide resolved
Signed-off-by: Christoph Auer <[email protected]>
@cau-git
Copy link
Contributor Author

cau-git commented Feb 5, 2025

@dolfim-ibm Problem with putting PAGE_HEADER and PAGE_FOOTER in the default labels:

  • Old docling versions which use this version of docling-core will undesiredly start outputting the headers and footers.
  • Technically it breaks backwards compatiblity

@cau-git cau-git marked this pull request as draft February 6, 2025 18:22
@cau-git cau-git marked this pull request as ready for review February 7, 2025 11:17
@cau-git cau-git merged commit 786f0c6 into main Feb 10, 2025
8 checks passed
@cau-git cau-git deleted the cau/add-content-layer branch February 10, 2025 09:58
Saidgurbuz pushed a commit that referenced this pull request Feb 13, 2025
…ure (#148)

* feat: Add ContentLayer attribute to designate items to body or furniture

Signed-off-by: Christoph Auer <[email protected]>

* introduce safer data gen mechanism, update chunking test data

Signed-off-by: Panos Vagenas <[email protected]>

* Do not make test rely on order in yaml

Signed-off-by: Christoph Auer <[email protected]>

* chore: format fixes

Signed-off-by: Christoph Auer <[email protected]>

* fix: legacy_to_docling_doc must use content_layer

Signed-off-by: Christoph Auer <[email protected]>

* Add content_layer in iterate_items

Signed-off-by: Christoph Auer <[email protected]>

* Bump format version, add model_validator for old page_header,page_footer in body

Signed-off-by: Christoph Auer <[email protected]>

* fix: Change to before model_validator

Signed-off-by: Christoph Auer <[email protected]>

* Update tests

Signed-off-by: Christoph Auer <[email protected]>

* Address review comments

Signed-off-by: Christoph Auer <[email protected]>

---------

Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Co-authored-by: Panos Vagenas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants