Skip to content

Alaqian/Linformer-Based-Conversational-Chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Linformer Based Conversational Chatbot

Contents

Introduction

Transformer has revolutionized the Natural Language Processing field with the attention mechanism. Some of the groundbreaking NLP models (GPT3 and BERT) of recent years are all based on transformer. However, the time and space complexity of transformer is largely dependent on the size of the input sequence. More specifically, the self-attention mechanism of transformer has a time complexity of O(n2) where n is the length of input sequence. Wang et. al. [1] proposed Linformer, a linear complexity O(n) self-attention mechanism, that can speed up the inference speed of the model significantly. We sought to find whether Linformer could be used to train and reduce the inference time while still produce a decent outcome in the case of conversational Chatbots, where training input sequences’ lengths are varied but mostly short.

Main requirements

  • Python 3.6 to 3.8 (3.6 preferred)*

*The list of dependencies required to run this project is in the requirements.txt file.

Special Thanks

This GitHub Repository's implementation heavily influenced by Clam004 [5]

How to Run

# Install dependencies
$ pip install -r requirements.txt

# Test run on default setting
$ python main.py

# A successful run should look like this

main

Main.py Adjustable Parameters

Description DType Arguments Default
Name to save weights, at /saved/weights/ string --weight "weight"
Name of train data, at /saved/data/ string --train "data2_train_9010"
Name to test data, at /saved/data/ string --test "data2_test_9010"
Batch size int --batch 32
# of epochs int --epoch 200
Modeler: Transformer or Linformer string --modeler "Linformer"
Linear Dimension of Linformer int --linear_dimension 256
Scheduler: plateau, cosine or warmup str --scheduler "plateau"
Dimension of Attention Layers int --dimension 512
Number of Attention Layers int --nlayers 6
Number of Attention Heads int --heads 8
learning rate float --lr 0.0003

How to chat with your saved model

# For Transformer run
$ python bot.py --model transformer --weight PATH

# For Linformer run
$ python bot.py --model linformer --weight PATH -- linear_dimension SAME_AS_WEIGHT_SETTING

# You should see a prompt like this:

bot

Bot.py Adjustable Parameters

Description DType Arguments Default
Name to save weights, at /saved/weights/ string --weight "weight"
Modeler: transformer or linformer string --modeler "linformer"
Linear Dimension of Linformer int --linear_dimension 256

Repository and Code Structure

  • Execution files are locating in the root directory.
  • Transfomer model, Linformer model, and Tokenizer script are in "scripts" directory.
  • Default saved weight location is "saved/weight" directory.
  • Default data location is "saved/data" directory.

Results

Selecting Learning Rate

  • We've found the best learning rate at 0.0003

Selecting scheduler and number of epochs

  • We've chosen "Reduce on Plateau" as our learning rate scheduler.
  • Also we've selected 500 epochs for further training because the training losses are flatten after 500 epochs.

scheduler

Where are the correct responses?

  • We've seen that at training loss between 1 and 2 the probability of correct response is attractive.

transformer_table

What is the linear dimension for Linformer that reached loss of 2

  • K between 32 and 256 could reach loss of 2

linformer_chart

Our Linformer Results

  • K <= 64 works better with simple question
  • K >= 128 works better with more sophisticated question

linformer_table

Comparing total execution time at 500 epochs

  • Linformer does not reduce total execution time for Conversational Chatbot

Conclusion

In this project we trained various chatbots with Transformer and Linformer using different learning rates, schedulers, and values for k.

Using a subjective model evaluation method, we determined that lower training loss models can produce higher percentage of valid conversational responses. Linformer models with high k values tended to have similar training loss patterns as regular transformer models. At k<=64 the model worked better with simple utterances while at k>=128 the model seemed to be better at handling more sophisticated utterances. The Linformer was unable to reduce training time or inference time as we had initially hypothesized. Since most conversations contain less than 32 words, Linformer could not utilize E and F matrices effectively. In fact, since the Linformer had 2 extra parameters, in these instances the model took longer to train and test.

For further work, the effect of changing the k value to the quality of responses could be studies. During the experimentation, higher k value models tended to handle more sophisticated utterances better. Furthermore, the study can be repeated with using longer forms of text that have an input length sufficiently larger than then linear dimension. This would enable the Linformer to utilize the E and F matrices. For instance, the bot could be trained on full movie scripts to generate a movie script of its own or it could be used to generate posts on a forum by looking at other posts.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages