Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluating new reference model #29

Merged
merged 27 commits into from
Dec 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
43cc8a1
evaluating new reference model
PeterStaar-IBM Nov 10, 2023
8972c76
updating the trainers
PeterStaar-IBM Nov 12, 2023
13dc22e
refactoring the properties in a document
PeterStaar-IBM Nov 14, 2023
2421230
updated the tests
PeterStaar-IBM Nov 15, 2023
8f4c248
merged with main
PeterStaar-IBM Nov 15, 2023
6796127
working on the tests
PeterStaar-IBM Nov 15, 2023
b64385c
working on the tests (2)
PeterStaar-IBM Nov 15, 2023
6e89e28
merged with main
PeterStaar-IBM Nov 16, 2023
8bbe70a
fixing tests one by one
PeterStaar-IBM Nov 16, 2023
e5b051a
fixed the test
PeterStaar-IBM Nov 16, 2023
6dd246e
updating document structure
PeterStaar-IBM Nov 17, 2023
815c65c
Merge branch 'main' into dev/upgrade-reference-model
PeterStaar-IBM Nov 17, 2023
050de9a
updated the test_glm
PeterStaar-IBM Nov 17, 2023
1022752
bumped version to 0.7.0
PeterStaar-IBM Nov 17, 2023
44cb878
removed unnecessary functions
PeterStaar-IBM Nov 17, 2023
dc26289
updated the instances
PeterStaar-IBM Nov 17, 2023
bad3a86
working on reproducibility
PeterStaar-IBM Nov 20, 2023
f4fa114
quick commit
PeterStaar-IBM Nov 24, 2023
74bf3f1
fixed some of the tests
PeterStaar-IBM Nov 27, 2023
cbcbb63
all nlp tests pass
PeterStaar-IBM Nov 28, 2023
c87de6f
updated the tokenization
PeterStaar-IBM Nov 29, 2023
640e3e5
small updates
PeterStaar-IBM Nov 29, 2023
fdc1e29
merged with main
PeterStaar-IBM Nov 29, 2023
8e4e899
improved reference output
PeterStaar-IBM Dec 1, 2023
f689409
updated test02A
PeterStaar-IBM Dec 1, 2023
41b8b2f
updated the models
PeterStaar-IBM Dec 3, 2023
f30bbe1
updated the models (2)
PeterStaar-IBM Dec 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions deepsearch_glm/glm_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,7 @@ def load_glm_config(idir:str):
def load_glm(idir:str):

config = load_glm_config(idir)

#glm = andromeda_glm.glm_model()

glm = glm_model()
glm.load(config)

Expand All @@ -60,7 +59,7 @@ def create_glm_config_from_docs(odir:str, json_files:list[str],
},
"save": {
"root": odir,
"write-CSV": True,
"write-CSV": False,
"write-JSON": False,
"write-path-text": False
}
Expand Down Expand Up @@ -129,7 +128,6 @@ def create_glm_from_docs(odir:str, json_files:list[str],

config = create_glm_config_from_docs(odir, json_files, nlp_models)

#glm = andromeda_glm.glm_model()
glm = glm_model()
glm.create(config)

Expand Down
14 changes: 13 additions & 1 deletion deepsearch_glm/nlp_apply_on_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@

import pandas as pd

from utils.ds_utils import convert_pdffiles, to_legacy_document_format
from tabulate import tabulate

from utils.ds_utils import convert_pdffiles, to_legacy_document_format
from deepsearch_glm.andromeda_nlp import nlp_model

def parse_arguments():
Expand Down Expand Up @@ -100,6 +101,14 @@ def init_nlp_model(models:str, filters:list[str]=[]):

return model

def show_texts(doc_j):

data=[]
for item in doc_j["texts"]:
data.append([item["hash"], item["text-hash"], item["text"][0:48]])

print(tabulate(data, headers=["hash", "text-hash", "text"]))

def show_doc(doc_j):

"""
Expand All @@ -125,6 +134,9 @@ def show_doc(doc_j):
print(json.dumps(doc_j["tables"][0], indent=2))
"""

if "texts" in doc_j:
show_texts(doc_j)

if "properties" in doc_j:
props = pd.DataFrame(doc_j["properties"]["data"],
columns=doc_j["properties"]["headers"])
Expand Down
Loading