Skip to content

Multi label classification of the Cause and Body Part from the Description of an Injury using the BERT model.

Notifications You must be signed in to change notification settings

Gowtham58/Multilabel-Classifier-for-Injury

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multilabel Classifier for Injury using BERT

Description

The goal is to classify the Cause and Body Part from the Description of an Injury usin the BERT model. Processed the dataset and saved them in a csv file. Then for training, bert-base-uncased model from HuggingFace was used. Two models were build one for classifying Cause and another for classifying Body Part.

image

Directory Structure:

The Jupyter notebook files are in the Model Training folder. The train.csv and test.csv files are inside the directory ‘data’. The data directory is also used in storing the processed csv files that will be loaded later in model training

Preprocessing the Dataset / Exploratory Data Analysis – EDA.ipynb:

Loading the train.csv file we can see that it has 6 columns [LossDescription, ResultingInjuryDesc, PartInjuredDesc, Cause - Hierarchy 1, Body Part - Hierarchy 1, Index], and loaded using the index column as index.

image

The texts in [LossDescription, ResultingInjuryDesc, PartInjuredDesc] were merged into a single column called ‘description’ and dropped.

image

Now our dataframe only has 3 columns namely [Cause - Hierarchy 1, Body Part - Hierarchy 1, description].

image

The text inside the dataframe is converted to lowercase.

image

Then we split the dataframe into two separate dataframes, because we need to classify two tables separately. Once we have split, the null values are dropped from the two dataframes.

image image

We can see that there are 12 values in Cause - Hierarchy 1 and 7 values in Body Part - Hierarchy 1.

image

Now the text values are converted into labels in a new column called label.

image

These dataframes are them saved as csv files in side the data directory only containing the columns [description, label].

image

Model Training – cause_train.ipynb, bodypart_train.ipynb:

I used the bert-base-uncased pre-trained model from HuggingFace for classification. I loaded the model and finetuned it in pytorch using the dataset so that it can classify the class labels. Loaded the processed dataset, split the dataset into train and test set then tokenized the description column so that it will be accepted by the model.

image

Removed the Unnecessary columns

image

Loaded the data as a pytroch dataloader object

image

Loaded the Bert model

image

Initializing the optimizer and Learning rate scheduler

image

Finetuning the Pretrained model

image

This method was used for both the models namely the cause_model to classify Cause - Hierarchy 1 and bodypart_model to classify Body Part - Hierarchy 1.

Evaluation:

I got the Following accuracy on the Validation set for the respective models Model for Classifying Cause - Hierarchy 1

image

Saving the model as cause_model

image

Model for Classifying Body Part - Hierarchy 1

image

Saving this model as bodypart_model

image

Model Inference / Prediction – test.ipynb:

The Prediction follows the same method as before like processing the dataset and loading it. Processing the test data and saving it in a csv file.

image

Loading the processed test data and tokenizing it.

image

Loading both of the saved classification models

image

Running the inference and obtaining the prediction. image

Saving the predictions into the test.csv dataset.

image image

Reproducing the Code:

  • Ensure the requirements.txt are installed
  • First run the EDA.ipynb – This contains the exploratory data analysis of the dataset, the processing is done and saved as a csv file one for the Cause - Hierarchy 1 and another for Body Part - Hierarchy 1.
  • Then run the cause_train.ipynb followed by bodypart_train.ipynb or vice versa, this will fine tune the bert-base-uncased model and saves the model locally.
  • Run the test.ipynb - This is where we load both the models and predict the labels for the test.csv. and saved the file in a new test.csv file which contains the predicted labels

FastAPI Integration

sample input: "punched ee in the face numerous times by person supported., struck or injured by, head"

Screenshot (474)

Screenshot (475)

we get out put as:

{
  "answer": [
    "struck or injured by",
    "head"
  ]
}

Dockerization

  • Dockerized the FastAPI application.
  • All files to dockerize the application is in app folder

Steps to Run the Application

  • Obtain the fine-tuned model by running the bodypart_train.ipynb and cause_train.ipynb
  • Then copy the model files into the bodypart_model and causes_model inside the app folder
  • Finally run the docker compose up --build to build the docker image

About

Multi label classification of the Cause and Body Part from the Description of an Injury using the BERT model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published