Linformer Based Conversational Chatbot

Introduction

Transformer has revolutionized the Natural Language Processing field with the attention mechanism. Some of the groundbreaking NLP models (GPT3 and BERT) of recent years are all based on transformer. However, the time and space complexity of transformer is largely dependent on the size of the input sequence. More specifically, the self-attention mechanism of transformer has a time complexity of O(n2) where n is the length of input sequence. Wang et. al. [1] proposed Linformer, a linear complexity O(n) self-attention mechanism, that can speed up the inference speed of the model significantly. We sought to find whether Linformer could be used to train and reduce the inference time while still produce a decent outcome in the case of conversational Chatbots, where training input sequences’ lengths are varied but mostly short.

Main requirements

Python 3.6 to 3.8 (3.6 preferred)*

*The list of dependencies required to run this project is in the requirements.txt file.

Special Thanks

This GitHub Repository's implementation heavily influenced by Clam004 [5]

How to Run

# Install dependencies
$ pip install -r requirements.txt

# Test run on default setting
$ python main.py

# A successful run should look like this

Main.py Adjustable Parameters

Description	DType	Arguments	Default
Name to save weights, at /saved/weights/	string	--weight	"weight"
Name of train data, at /saved/data/	string	--train	"data2_train_9010"
Name to test data, at /saved/data/	string	--test	"data2_test_9010"
Batch size	int	--batch	32
# of epochs	int	--epoch	200
Modeler: Transformer or Linformer	string	--modeler	"Linformer"
Linear Dimension of Linformer	int	--linear_dimension	256
Scheduler: plateau, cosine or warmup	str	--scheduler	"plateau"
Dimension of Attention Layers	int	--dimension	512
Number of Attention Layers	int	--nlayers	6
Number of Attention Heads	int	--heads	8
learning rate	float	--lr	0.0003

How to chat with your saved model

# For Transformer run
$ python bot.py --model transformer --weight PATH

# For Linformer run
$ python bot.py --model linformer --weight PATH -- linear_dimension SAME_AS_WEIGHT_SETTING

# You should see a prompt like this:

Bot.py Adjustable Parameters

Description	DType	Arguments	Default
Name to save weights, at /saved/weights/	string	--weight	"weight"
Modeler: transformer or linformer	string	--modeler	"linformer"
Linear Dimension of Linformer	int	--linear_dimension	256

Repository and Code Structure

Execution files are locating in the root directory.
Transfomer model, Linformer model, and Tokenizer script are in "scripts" directory.
Default saved weight location is "saved/weight" directory.
Default data location is "saved/data" directory.

Results

Selecting Learning Rate

We've found the best learning rate at 0.0003

Selecting scheduler and number of epochs

We've chosen "Reduce on Plateau" as our learning rate scheduler.
Also we've selected 500 epochs for further training because the training losses are flatten after 500 epochs.

Where are the correct responses?

We've seen that at training loss between 1 and 2 the probability of correct response is attractive.

What is the linear dimension for Linformer that reached loss of 2

K between 32 and 256 could reach loss of 2

Our Linformer Results

K <= 64 works better with simple question
K >= 128 works better with more sophisticated question

Comparing total execution time at 500 epochs

Linformer does not reduce total execution time for Conversational Chatbot

Conclusion

In this project we trained various chatbots with Transformer and Linformer using different learning rates, schedulers, and values for k.

Using a subjective model evaluation method, we determined that lower training loss models can produce higher percentage of valid conversational responses. Linformer models with high k values tended to have similar training loss patterns as regular transformer models. At k<=64 the model worked better with simple utterances while at k>=128 the model seemed to be better at handling more sophisticated utterances. The Linformer was unable to reduce training time or inference time as we had initially hypothesized. Since most conversations contain less than 32 words, Linformer could not utilize E and F matrices effectively. In fact, since the Linformer had 2 extra parameters, in these instances the model took longer to train and test.

For further work, the effect of changing the k value to the quality of responses could be studies. During the experimentation, higher k value models tended to handle more sophisticated utterances better. Furthermore, the study can be repeated with using longer forms of text that have an input length sufficiently larger than then linear dimension. This would enable the Linformer to utilize the E and F matrices. For instance, the bot could be trained on full movie scripts to generate a movie script of its own or it could be used to generate posts on a forum by looking at other posts.

References

[1] Linformer: Self-Attention with Linear Complexity: https://arxiv.org/pdf/2006.04768.pdf
[2] Cornell Movie-Dialog Corpus Dataset: https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
[3] Deep Learning Based Chatbot Models, https://arxiv.org/pdf/1908.08835.pdf
[4] Attention is All You Need, https://arxiv.org/abs/1706.03762
[5] Clam004 Chat Transformer, https://github.com/clam004/chat-transformer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation