Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rui/use system deps #80

Merged
merged 12 commits into from
Oct 9, 2024
Merged

Rui/use system deps #80

merged 12 commits into from
Oct 9, 2024

Conversation

rmdg88
Copy link
Collaborator

@rmdg88 rmdg88 commented Oct 4, 2024

  • updated pyproject.toml to use pybind at build-system
  • updated pyproject.toml include = [ ] to include more paths
  • updated build.py
  • add linux sdist build step
  • updated cmake external deps to optionally use system deps and use pybind from poetry
  • bumped external deps CXXOPTS to v3.2.0
  • updated external deps LOGURU install method
  • add rhel container to build with system deps
  • add ci test step for windows and mac
  • fix modules load_pretrained_nlp_models, load_training_data to properly construct URL and be compatible cross platform to load tests

@rmdg88 rmdg88 force-pushed the rui/use-system-deps branch from 852160b to 4849b22 Compare October 4, 2024 20:46
@rmdg88 rmdg88 requested a review from dolfim-ibm October 4, 2024 20:48
@rmdg88 rmdg88 force-pushed the rui/use-system-deps branch from c43f9d7 to f010910 Compare October 4, 2024 20:53
rmdg88 added 4 commits October 6, 2024 19:26
Signed-off-by: rmdg88 <[email protected]>
Signed-off-by: rmdg88 <[email protected]>
Signed-off-by: rmdg88 <[email protected]>
@rmdg88 rmdg88 force-pushed the rui/use-system-deps branch from d54861c to 6fa3bcb Compare October 6, 2024 18:26
@rmdg88
Copy link
Collaborator Author

rmdg88 commented Oct 6, 2024

regarding the sys deps build:

  • the loguru-devel does not include loguru.cpp, it only include loguru.hpp i tried to update that in the existing files, but fails to work properly, i guess the code needs it (maybe the .rpm can be packaged with it) as workaround i've added the git clone and make install in the build_rhel.sh to move forward

  • at the moment, there is only one test failing, train tokenizer this one:

    # train tokenizer
    def test_07B():
    resources_dir = get_resources_dir()
    txt_file = f"{resources_dir}/data/text/arxiv-abstracts-2020-Jan.txt"
    assert os.path.exists(txt_file)
    model_type = "unigram"
    model_name = "test-tokenizer-model"
    print(f"training on {txt_file}")
    model_file = create_tok_model(
    model_type=model_type, model_name=model_name, ifile=txt_file
    )
    assert os.path.exists(model_name + ".model")
    assert os.path.exists(model_name + ".vocab")
    # load custom tokenizer
    def test_07C():
    model_name = "test-tokenizer-model"
    model_file = f"{model_name}.model"
    assert os.path.exists(model_file)
    model = init_nlp_model(
    f"language;custom_spm({model_name}:{model_file})",
    filters=["properties", "instances"],
    )

see:

Screenshot 2024-10-06 164432

  • dunno if is something related with sentencepiece_train we use 2 static libs in the compile from source version, or if is something else

  • i've checked and the sentencepiece-devel installs both libs

Screenshot 2024-10-05 174957

  • all the other tests pass in the sys deps version

Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯

@dolfim-ibm dolfim-ibm merged commit 8e93d57 into main Oct 9, 2024
31 checks passed
@dolfim-ibm dolfim-ibm deleted the rui/use-system-deps branch October 9, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants