Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make training code available? #2

Open
JohnnyC08 opened this issue Apr 27, 2021 · 9 comments
Open

Make training code available? #2

JohnnyC08 opened this issue Apr 27, 2021 · 9 comments

Comments

@JohnnyC08
Copy link

I'm interested in reproducing the model in pytorch and am curious how you preprocessed the data and trained it. I didn't see any metrics reported and want to see what those look like as well! So, the training script would be nice to have as well!

Great repo by the way!

@creatorrr
Copy link

+1 @JohnnyC08

It'd also be interesting to see if including context (conversation history / last utterances) improves the accuracy of predictions.

@JohnnyC08
Copy link
Author

@creatorrr That's interesting.

How would you go about doing that? My first thought is using a rolling window and making a single block of text out of the elements of the window and assigning the label to the text block of the last element in the block.

How would you do it?

@creatorrr
Copy link

@JohnnyC08 I was thinking of something simpler, prepending dialog act labels of the last three utterances to the input vector when finetuning it. For example, take this conversation:

A: Do you want to grab lunch? [Yes-No-Question]
B: Not really. [Dispreferred-answers]
A: Oh okay. [Response-Acknowledgement]
B: How about tomorrow? <<TO PREDICT>>

Then the input vector would be:
[CLS] Yes-No-Question [SEP] Dispreferred-answers [SEP] Response-Acknowledgement [SEP] How about tomorrow? [SEP]

@argideritzalpea
Copy link

@creatorrr @JohnnyC08 Did either of you end up creating a previous context-dependent model? Also, were you able to successfully predict on a GPU? Loading the model is entirely allocating all my card's memory, suggesting a leak in the loading of the model that is downloaded.

@argideritzalpea
Copy link

argideritzalpea commented Dec 18, 2021

@bhavitvyamalik Thanks again for publishing the model. I think that some comments on how training was conducted would really make this repo more complete.

What are the inputs for training on the SBWA corpus? Are they single sentences or sequences of sentences?

What training scripts were used to train this model?

Any utilities to customize this for another dataset?

What parameters were used for fine-tuning?

What outputs of the DistilBert encoding are used for the classification task?

I am attempting to use this for DA labeling on a conversational dataset and it is giving various and poor results for the same simple sentence "Okay." I assume this is because of dropout and also over or under fitting. Overall I'm not sure that this model gives me confidence required to use for my project as is. If training scripts and the data were released that would be awesome!

@creatorrr
Copy link

creatorrr commented Jan 19, 2022

@creatorrr @JohnnyC08 Did either of you end up creating a previous context-dependent model? Also, were you able to successfully predict on a GPU? Loading the model is entirely allocating all my card's memory, suggesting a leak in the loading of the model that is downloaded.

Haven’t gotten around to it yet, been really busy but will give it a try one of these weekends @argideritzalpea

@creatorrr
Copy link

@creatorrr That's interesting.

How would you go about doing that? My first thought is using a rolling window and making a single block of text out of the elements of the window and assigning the label to the text block of the last element in the block.

How would you do it?

Ever got a chance to try this out @JohnnyC08 ?

@hannan72
Copy link

hannan72 commented Aug 21, 2022

Hi,
Could you please share the training scripts?
Also could you please share the link to the training data?

@creatorrr
Copy link

@JohnnyC08 I ended up training a deberta based dialog act classifer on silicone-merged dataset using sentence pairs (previous utterance, current utterance) and it performs better than single utterances. You can take a look here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants