Skip to content

feat: Add CodeItem as pydantic type, update export methods and APIs #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 17, 2025

Conversation

Matteo-Omenetti
Copy link
Contributor

No description provided.

Copy link

mergify bot commented Jan 15, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

@cau-git cau-git changed the title Code item feat: Add CodeItem as pydantic type Jan 15, 2025
@cau-git cau-git changed the title feat: Add CodeItem as pydantic type feat: Add CodeItem as pydantic type, update export methods and APIs Jan 15, 2025
@cau-git cau-git marked this pull request as ready for review January 15, 2025 10:36
@cau-git cau-git requested review from cau-git and dolfim-ibm January 15, 2025 10:36
Matteo-Omenetti added 5 commits January 15, 2025 12:14
Signed-off-by: Matteo-Omenetti <[email protected]>
Signed-off-by: Matteo-Omenetti <[email protected]>
Signed-off-by: Matteo-Omenetti <[email protected]>
Signed-off-by: Matteo-Omenetti <[email protected]>
Signed-off-by: Matteo-Omenetti <[email protected]>
cau-git
cau-git previously approved these changes Jan 15, 2025
dolfim-ibm
dolfim-ibm previously approved these changes Jan 17, 2025
Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dolfim-ibm
Copy link
Contributor

I think we should anyway consider having a stable enum for the programming languages. From the test about I see upcoming problems with Python vs python vs py, CSharp vs C# vs c#, etc

Comment on lines 146 to 202
ADA = "Ada"
AWK = "Awk"
BASH = "Bash"
C = "C"
C_SHARP = "C#"
C_PLUS_PLUS = "C++"
CMAKE = "CMake"
COBOL = "COBOL"
CSS = "CSS"
CEYLON = "Ceylon"
CLOJURE = "Clojure"
CRYSTAL = "Crystal"
CUDA = "Cuda"
CYTHON = "Cython"
D = "D"
DART = "Dart"
DOCKERFILE = "Dockerfile"
ELIXIR = "Elixir"
ERLANG = "Erlang"
FORTRAN = "FORTRAN"
FORTH = "Forth"
GO = "Go"
HTML = "HTML"
HASKELL = "Haskell"
HAXE = "Haxe"
JAVA = "Java"
JAVASCRIPT = "JavaScript"
JULIA = "Julia"
KOTLIN = "Kotlin"
LISP = "Lisp"
LUA = "Lua"
MATLAB = "Matlab"
MOONSCRIPT = "MoonScript"
NIM = "Nim"
OCAML = "OCaml"
OBJECTIVEC = "ObjectiveC"
OCTAVE = "Octave"
PHP = "PHP"
PASCAL = "Pascal"
PERL = "Perl"
PROLOG = "Prolog"
PYTHON = "Python"
RACKET = "Racket"
RUBY = "Ruby"
RUST = "Rust"
SML = "SML"
SQL = "SQL"
SCALA = "Scala"
SCHEME = "Scheme"
SWIFT = "Swift"
TYPESCRIPT = "TypeScript"
VISUALBASIC = "VisualBasic"
XML = "XML"
YAML = "YAML"
BC = "bc"
DC = "dc"
UNKNOWN = "unknown"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are partially sorted alphabetically, but not completely — I would best make the sorting consistent.

@Matteo-Omenetti Matteo-Omenetti force-pushed the code_item branch 2 times, most recently from fa63048 to 86cbef2 Compare January 17, 2025 15:50
@cau-git cau-git requested review from dolfim-ibm and vagenas January 17, 2025 15:52
@cau-git cau-git merged commit c940aa5 into main Jan 17, 2025
14 checks passed
@cau-git cau-git deleted the code_item branch January 17, 2025 15:59
muhark added a commit to muhark/docling-core that referenced this pull request Mar 19, 2025
…ocling-project#129)

* added code item

* added code item

* added code item

Signed-off-by: Matteo-Omenetti <[email protected]>

* added code item

Signed-off-by: Matteo-Omenetti <[email protected]>

* added code item

Signed-off-by: Matteo-Omenetti <[email protected]>

* added code item

Signed-off-by: Matteo-Omenetti <[email protected]>

* added code item

Signed-off-by: Matteo-Omenetti <[email protected]>

* add constraints to allow numpy > 2.1.0 on python3.13 and others

Signed-off-by: Michele Dolfi <[email protected]>

* Add CodeItem to ContentItem

Signed-off-by: Christoph Auer <[email protected]>

* added CodeItem in ContentItem tagged union.

* added enum for programming languages

* removed double CodeItem in ContentItem Union

* fixed type of code_language in CodeItem class

* fixed sorting of programming languages, not sorted anymore by value of string but variable name

---------

Signed-off-by: Matteo-Omenetti <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Co-authored-by: Matteo-Omenetti <[email protected]>
Co-authored-by: Michele Dolfi <[email protected]>
Co-authored-by: Christoph Auer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants