Multilabel Classifier for Injury using BERT

Description

The goal is to classify the Cause and Body Part from the Description of an Injury usin the BERT model. Processed the dataset and saved them in a csv file. Then for training, bert-base-uncased model from HuggingFace was used. Two models were build one for classifying Cause and another for classifying Body Part.

Directory Structure:

The Jupyter notebook files are in the Model Training folder. The train.csv and test.csv files are inside the directory ‘data’. The data directory is also used in storing the processed csv files that will be loaded later in model training

Preprocessing the Dataset / Exploratory Data Analysis – EDA.ipynb:

Loading the train.csv file we can see that it has 6 columns [LossDescription, ResultingInjuryDesc, PartInjuredDesc, Cause - Hierarchy 1, Body Part - Hierarchy 1, Index], and loaded using the index column as index.

The texts in [LossDescription, ResultingInjuryDesc, PartInjuredDesc] were merged into a single column called ‘description’ and dropped.

Now our dataframe only has 3 columns namely [Cause - Hierarchy 1, Body Part - Hierarchy 1, description].

The text inside the dataframe is converted to lowercase.

Then we split the dataframe into two separate dataframes, because we need to classify two tables separately. Once we have split, the null values are dropped from the two dataframes.

We can see that there are 12 values in Cause - Hierarchy 1 and 7 values in Body Part - Hierarchy 1.

Now the text values are converted into labels in a new column called label.

These dataframes are them saved as csv files in side the data directory only containing the columns [description, label].

Model Training – cause_train.ipynb, bodypart_train.ipynb:

I used the bert-base-uncased pre-trained model from HuggingFace for classification. I loaded the model and finetuned it in pytorch using the dataset so that it can classify the class labels. Loaded the processed dataset, split the dataset into train and test set then tokenized the description column so that it will be accepted by the model.

Removed the Unnecessary columns

Loaded the data as a pytroch dataloader object

Loaded the Bert model

Initializing the optimizer and Learning rate scheduler

Finetuning the Pretrained model

This method was used for both the models namely the cause_model to classify Cause - Hierarchy 1 and bodypart_model to classify Body Part - Hierarchy 1.

Evaluation:

I got the Following accuracy on the Validation set for the respective models Model for Classifying Cause - Hierarchy 1

Saving the model as cause_model

Model for Classifying Body Part - Hierarchy 1

Saving this model as bodypart_model

Model Inference / Prediction – test.ipynb:

The Prediction follows the same method as before like processing the dataset and loading it. Processing the test data and saving it in a csv file.

Loading the processed test data and tokenizing it.

Loading both of the saved classification models

Running the inference and obtaining the prediction.

Saving the predictions into the test.csv dataset.

Reproducing the Code:

Ensure the requirements.txt are installed
First run the EDA.ipynb – This contains the exploratory data analysis of the dataset, the processing is done and saved as a csv file one for the Cause - Hierarchy 1 and another for Body Part - Hierarchy 1.
Then run the cause_train.ipynb followed by bodypart_train.ipynb or vice versa, this will fine tune the bert-base-uncased model and saves the model locally.
Run the test.ipynb - This is where we load both the models and predict the labels for the test.csv. and saved the file in a new test.csv file which contains the predicted labels

FastAPI Integration

sample input: "punched ee in the face numerous times by person supported., struck or injured by, head"

we get out put as:

{
  "answer": [
    "struck or injured by",
    "head"
  ]
}

Dockerization

Dockerized the FastAPI application.
All files to dockerize the application is in app folder

Steps to Run the Application

Obtain the fine-tuned model by running the bodypart_train.ipynb and cause_train.ipynb
Then copy the model files into the bodypart_model and causes_model inside the app folder
Finally run the docker compose up --build to build the docker image

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Model Training		Model Training
app		app
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilabel Classifier for Injury using BERT

Description

Directory Structure:

Preprocessing the Dataset / Exploratory Data Analysis – EDA.ipynb:

Model Training – cause_train.ipynb, bodypart_train.ipynb:

Evaluation:

Model Inference / Prediction – test.ipynb:

Reproducing the Code:

FastAPI Integration

Dockerization

Steps to Run the Application

About

Releases

Packages

Languages

Gowtham58/Multilabel-Classifier-for-Injury

Folders and files

Latest commit

History

Repository files navigation

Multilabel Classifier for Injury using BERT

Description

Directory Structure:

Preprocessing the Dataset / Exploratory Data Analysis – EDA.ipynb:

Model Training – cause_train.ipynb, bodypart_train.ipynb:

Evaluation:

Model Inference / Prediction – test.ipynb:

Reproducing the Code:

FastAPI Integration

Dockerization

Steps to Run the Application

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages