Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLU-61: Jenkins release #266

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
32609ec
Added Visual Document NER
gadde5300 Feb 27, 2024
a37eca1
new healthcare pipeline explain_clinical_doc_oncology added
SKocer Mar 24, 2024
ea6c982
new healthcare pipeline explain_clinical_doc_generic added
SKocer Mar 24, 2024
7843d62
new healthcare pipeline explain_clinical_doc_vop is added
SKocer Apr 4, 2024
706dffa
new healthcare pipeline hpo_resolver_pipeline is added
SKocer Apr 4, 2024
dedf2ad
new healthcare pipeline atc_resolver_pipeline is added
SKocer Apr 5, 2024
73686ea
Merge pull request #259 from JohnSnowLabs/skocer_05042024_NLU_atc_res…
C-K-Loan Apr 8, 2024
f76222d
Merge pull request #258 from JohnSnowLabs/skocer_04042024_NLU_hpo_res…
C-K-Loan Apr 8, 2024
fea7ad2
Merge pull request #257 from JohnSnowLabs/skocer_04042024_NLU_explain…
C-K-Loan Apr 8, 2024
d6bbe8d
Merge remote-tracking branch 'origin/release/531' into release/531
C-K-Loan Apr 8, 2024
a758f64
Merge remote-tracking branch 'origin/sk-24032024-nlu_explain_clinical…
C-K-Loan Apr 8, 2024
28ca5a0
Merge remote-tracking branch 'origin/visual-ner' into release/531
C-K-Loan Apr 8, 2024
9467995
decomment class
C-K-Loan Apr 8, 2024
843123a
TextMatcherInternalModel annotator, clinical_deidentification_generic…
SKocer Apr 22, 2024
7f211d9
Update README.md
C-K-Loan Apr 23, 2024
e530ce2
Merge pull request #249 from JohnSnowLabs/visual-ner
C-K-Loan Apr 23, 2024
fb196a1
Merge pull request #262 from JohnSnowLabs/skocer_22042024_NLU_clinica…
C-K-Loan Apr 23, 2024
823592d
update version
C-K-Loan-ADIA Apr 30, 2024
69c9452
Merge remote-tracking branch 'origin/release/531' into release/531
C-K-Loan-ADIA Apr 30, 2024
6efe108
Added Visual Form Relation Extractor
gadde5300 May 5, 2024
1f227de
Updated Form Relation Extractor
gadde5300 May 13, 2024
a568de8
init
WeichenXu123 May 20, 2024
780ca59
update
WeichenXu123 May 21, 2024
76161f0
Merge pull request #263 from JohnSnowLabs/release/531
C-K-Loan May 21, 2024
94b1c3d
Merge pull request #264 from WeichenXu123/fix-is_running_in_databricks
C-K-Loan May 21, 2024
8564e7a
version bump
C-K-Loan May 21, 2024
506860e
Merge pull request #265 from JohnSnowLabs/release/532
C-K-Loan May 21, 2024
a6e8099
add version file
faisaladnanpeltops Jun 5, 2024
a764cd4
bump to version 5.1.5rc21
faisaladnanpeltops Jun 8, 2024
b9e3373
config pylint
faisaladnanpeltops Jun 10, 2024
08326d1
delete .ci folder
faisaladnanpeltops Jun 10, 2024
721ab33
Merge branch 'master' into jenkins_release
faisaladnanpeltops Jun 14, 2024
c53c994
test version 1.1.11
faisaladnanpeltops Jun 14, 2024
903b5c2
Merge branch 'jenkins_release' of github.com:JohnSnowLabs/nlu into je…
faisaladnanpeltops Jun 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ See how easy it is to use any of the **thousands** of models in 1 line of code,
This 1 line let's you visualize and play with **1000+ SOTA NLU & NLP models** in **200** languages

```shell
streamlit run https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/examples/streamlit/01_dashboard.py
streamlit run https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/examples/streamlit/01_dashboard.py
```
<img src="https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/docs/assets/streamlit_docs_assets/gif/start.gif">

Expand Down
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1.1.11

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

183 changes: 183 additions & 0 deletions examples/colab/ocr/ocr_form_relation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/ocr/ocr_form_relation_extractor.ipynb)\n",
"\n",
"[Tutorial Notebook](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/ocr/ocr_form_relation_extractor.ipynb \"https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/ocr/ocr_form_relation_extractor.ipynb\")\n"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"# **FormRelationExtractor**\n",
"\n",
"\n",
"The **FormRelationExtractor** is a tool designed to identify the relationships between keys and values. It’s particularly useful in the context of data extracted by a Named Entity Recognition (NER) system, such as VisualDocumentNER.\n",
"\n",
"**All the available models:**\n",
"\n",
"| NLU Spell | Transformer Class |\n",
"|----------------------|-----------------------------------------------------------------------------------------|\n",
"| nlu.load(`visual_form_relation_extractor`) | [FormRelationExtractor](https://nlp.johnsnowlabs.com/docs/en/ocr_visual_document_understanding#formrelationextractor) |"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## **Install NLU**"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"!pip install johnsnowlabs\n",
"nlp.install(visual=True,force_browser=True)\n",
"nlp.start(visual=True)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## **Form Relation Extraction**"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 1,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"🚨 Outdated Medical Secrets in license file. Version=5.3.1 but should be Version=5.1.1\n",
"🚨 Outdated OCR Secrets in license file. Version=5.3.1 but should be Version=5.0.2\n",
"📋 Loading license number 0 from C:\\Users\\gadde/.johnsnowlabs\\licenses/license_number_0_for_Spark-Healthcare_Spark-OCR.json\n",
"👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False if you want to install outdated secrets.\n",
"👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False if you want to install outdated secrets.\n",
"👷 Setting up John Snow Labs home in C:\\Users\\gadde/.johnsnowlabs, this might take a few minutes.\n",
"Downloading 🫘+🚀 Java Library spark-nlp-assembly-5.1.1.jar\n",
"🙆 JSL Home setup in C:\\Users\\gadde/.johnsnowlabs\n",
"🤓 Looks like you are missing some jars, trying fetching them ...\n",
"👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False if you want to install outdated secrets.\n",
"Downloading 🫘+💊 Java Library spark-nlp-jsl-5.1.1.jar\n",
"Downloading 🫘+🕶 Java Library spark-ocr-assembly-5.0.2.jar\n",
"🙆 JSL Home setup in C:\\Users\\gadde/.johnsnowlabs\n",
"👷 Trying to install compatible secrets. Use nlp.settings.enforce_versions=False if you want to install outdated secrets.\n",
"👌 Launched \u001B[92mcpu optimized\u001B[39m session with with: 🚀Spark-NLP==5.3.1, 💊Spark-Healthcare==5.1.1, 🕶Spark-OCR==5.0.2, running on ⚡ PySpark==3.1.2\n",
"Warning::Spark Session already created, some configs may not take.\n",
"Warning::Spark Session already created, some configs may not take.\n",
"lilt_roberta_funsd_v1 download started this may take some time.\n",
"Approximate size to download 419.6 MB\n"
]
}
],
"source": [
"from johnsnowlabs import nlp,visual\n",
"model = nlp.load('visual_form_relation_extractor')"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-13T08:27:37.781697200Z",
"start_time": "2024-05-13T08:17:43.901075500Z"
}
}
},
{
"cell_type": "code",
"execution_count": 4,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Warning::Spark Session already created, some configs may not take.\n"
]
}
],
"source": [
"res = model.predict(['form.png','form2.jpg'])"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 5,
"outputs": [
{
"data": {
"text/plain": " form_relation_prediction_key form_relation_prediction_value \\\n0 division allied health \n0 course hce \n0 number 116 \n0 title calculations medical dosage \n0 credits 2 \n0 developed by dr . by taz \n0 lecture / lab lecture / o ratio 2 \n0 course activity no \n0 cip code 51 . 0800 \n0 semester fall and \n0 ge category none \n0 separate lab no \n0 course awareness no \n0 course no \n1 name : dribbler , bbb \n1 study date : 12 - 09 - 2006 , 6 : 34 \n1 bp : 120 / 80 mmhg \n1 mrn : 12341820060912 \n1 patient location : room \n1 hr : 100 bpm \n1 dob : 19 - 06 - 1979 \n1 gender : male \n1 height : 123 cm \n1 age : 27 years \n1 weight : 25 kg \n1 reason for study : mi \n1 bsa : 0 . 92 m \n1 history : asfgfdgsdg \n1 medications : heparine , paracetamol \n1 performed . the study technically limited . \n1 . no \n\n path \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n0 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... \n1 file:/F:/Work/repos/nlu_new/ner/nlu/examples/c... ",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>form_relation_prediction_key</th>\n <th>form_relation_prediction_value</th>\n <th>path</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>division</td>\n <td>allied health</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>course</td>\n <td>hce</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>number</td>\n <td>116</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>title</td>\n <td>calculations medical dosage</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>credits</td>\n <td>2</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>developed by</td>\n <td>dr . by taz</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>lecture / lab lecture / o ratio</td>\n <td>2</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>course activity</td>\n <td>no</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>cip code</td>\n <td>51 . 0800</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>semester</td>\n <td>fall and</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>ge category</td>\n <td>none</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>separate lab</td>\n <td>no</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>course awareness</td>\n <td>no</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>0</th>\n <td>course</td>\n <td>no</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>name :</td>\n <td>dribbler , bbb</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>study date :</td>\n <td>12 - 09 - 2006 , 6 : 34</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>bp :</td>\n <td>120 / 80 mmhg</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>mrn :</td>\n <td>12341820060912</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>patient location :</td>\n <td>room</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>hr :</td>\n <td>100 bpm</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>dob :</td>\n <td>19 - 06 - 1979</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>gender :</td>\n <td>male</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>height :</td>\n <td>123 cm</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>age :</td>\n <td>27 years</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>weight :</td>\n <td>25 kg</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>reason for study :</td>\n <td>mi</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>bsa :</td>\n <td>0 . 92 m</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>history :</td>\n <td>asfgfdgsdg</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>medications :</td>\n <td>heparine , paracetamol</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>performed .</td>\n <td>the study technically limited .</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>.</td>\n <td>no</td>\n <td>file:/F:/Work/repos/nlu_new/ner/nlu/examples/c...</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res_filtered = res[['form_relation_prediction_key','form_relation_prediction_value','path']]\n",
"res_filtered"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-13T08:40:51.701641600Z",
"start_time": "2024-05-13T08:40:51.627215600Z"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"name": "myenv",
"language": "python",
"display_name": "myenv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
968 changes: 968 additions & 0 deletions examples/colab/ocr/ocr_visual_document_ner.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion examples/colab/ocr/table_extraction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2752,4 +2752,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}
12 changes: 8 additions & 4 deletions nlu/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
__version__ = '5.1.5rc19'


import nlu.utils.environment.env_utils as env_utils

if not env_utils.try_import_pyspark_in_streamlit():
Expand All @@ -23,6 +20,13 @@
import pandas as pd
pd.DataFrame.iteritems = pd.DataFrame.items

def version():
version_path = os.path.abspath(os.path.dirname(__file__))
with open(os.path.join(version_path, '../VERSION'), encoding="utf-8") as version_file:
return version_file.read().strip()

__version__ = version()

def version(): return __version__


Expand Down Expand Up @@ -325,7 +329,7 @@ def load_nlu_pipe_from_hdd(pipe_path, request) -> NLUPipeline:
If it is a component_list, load the component_list and return it.
If it is a singular model_anno_obj, load it to the correct AnnotatorClass and NLU component_to_resolve and then generate pipeline for it
"""
if is_running_in_databricks():
if is_running_in_databricks_runtime():
return load_nlu_pipe_from_hdd_in_databricks(pipe_path, request)
pipe = NLUPipeline()
pipe.nlu_ref = request
Expand Down
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

class FormRelationExtractor:
@staticmethod
def get_default_model():
from sparkocr.transformers import FormRelationExtractor
return FormRelationExtractor() \
.setInputCol("text_entity") \
.setOutputCol("ocr_relations")
Empty file.
7 changes: 7 additions & 0 deletions nlu/ocr_components/utils/hocr_tokenizer/hocr_tokenizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
class HocrTokenizer:
@staticmethod
def get_default_model():
from sparkocr.transformers import HocrTokenizer
return HocrTokenizer() \
.setInputCol("hocr") \
.setOutputCol("text_tokenized")
Empty file.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
class VisualDocumentNer:
@staticmethod
def get_default_model():
from sparkocr.transformers import VisualDocumentNer
return VisualDocumentNer()\
.pretrained("lilt_roberta_funsd_v1", "en", "clinical/ocr")\
.setInputCols(["text_tokenized", "image"])\
.setOutputCol("text_entity")
Loading
Loading