Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Intents] Handling question marks #629

Closed
lucasvercelot opened this issue Jul 18, 2018 · 5 comments
Closed

[Intents] Handling question marks #629

lucasvercelot opened this issue Jul 18, 2018 · 5 comments
Labels

Comments

@lucasvercelot
Copy link

Hi guys !

I'm running into a small problem today, with the question mark '?' character.
I noticed that when I fit my model's intent with a sentence composed of a question mark (at the end of the sentence), when I input the sentence with and without the question mark, I get different results and important probability score diff.

When trained with question mark :

{ input: 'es-tu réel',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 0.46640502699361525 } }
{ input: 'es-tu réel ?',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 1 } }

When trained without question mark :

{ input: 'es-tu réel',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 1 } }
{ input: 'es-tu réél ?',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 0.40673316662106906 } }

When trained with both sentences 'es-tu réel' and 'es-tu réel ?' :

{ input: 'es-tu réel ?',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 1 } }
{ input: 'es-tu réel',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 1 } }

Do you have any idea why it's doing this ? Is it normal ?

Thanks in advance ! :)

@adrienball
Copy link
Contributor

Hey @lucasvercelot ,
This is actually an expected behavior, even though it can be confusing. The SnipsNLUEngine contains two intent parsers that are called successively, the second one being called only when the first one doesn't find any match.
The first parser, that we called the DeterministicIntentParser, is based on pattern matching (regular expressions) and ensures a perfect accuracy on the training data. This parser will match with a probability of 1.0 any input which has been provided in the training data, and even small variations around it.
When the input hasn't been seen at training time, this parser will fail to find a match and then the second parser, the ProbabilisticIntentParser, will be called. This one is based on machine learning (logistic regression and Conditional Random Fields) and is able to extract intents and slots from inputs that can be very different from what was provided at train time.

In your example with the ?, all the cases where you get a classification probability of 1.0 correspond to inputs that were parsed by the DeterministicIntentParser because they were in the training data.

I guess the confusion comes from the probability gap for sentences that are very close. You can disable the first parser by removing it from the NLUEngineConfig used when instantiating the SnipsNLUEngine, so you'll have consistent probabilities. However, this might degrade a bit the parsing accuracy.

To have a better understanding of the whole pipeline, you can have a look at the blogpost we published when we open sourced the library.

I hope this helps :)

@lucasvercelot
Copy link
Author

Hi @adrienball ,

I already read this article, which is great btw, a few months ago, and I completely forgot that the NLUEngine has two parsers, my bad ! :)

I'll try to disable the first parser and see what I got !

Thanks a lot, I'll come back to you, to let you know.

@lucasvercelot
Copy link
Author

Ok, so I tried without the DeterministicIntentParser and now the probability scores between the sentence with ? and the one without it are much closer, but much lower aswell :

{"input": "es-tu réel ?", "slots": [], "intent": {"intentName": "whoAreYouFR", "probability": 0.5505541897507736}}

{"input": "es-tu réel", "slots": [], "intent": {"intentName": "whoAreYouFR", "probability": 0.38014287318577317}}

I was expecting a higher score.
Now, for some intents, trained with words of short sentences, it doesn't event get an intent at all.
I wonder if this is a normal behavior, or if I should change the way I fill the intents' sentences.

I don't know if it's best to create entities if I want to train some intents with juste words or small groups of words (2-3-4 words sentences). I noticed that event for bigger sentences the probability score is pretty low, lower than when both parsers are enabled.

@adrienball
Copy link
Contributor

The probability of theProbabilisticIntentParser is not affected by the fact that the other parser is enabled or not.
The feedback we get about probabilities is that they are usually lower than what people expect. For now there is not much we can do about it if we stick with the current model (logistic regression) which performs quite well in terms of accuracy. In theory, you shouldn't have to worry about the absolute probabilities, as the intent which is returned is supposed to be the most likely one. However, comparing probabilities among the top intents could be useful but for now only the most likely intent is returned. See the thread about this: #623.

You shouldn't have to use entities if you don't need to, but that can be faster indeed to generate entity values with the intent vocabulary (if you know it exhaustively), instead of generating multiple sentences trying to cover all the vocabulary. So I guess that's something to try.

Intent classification relies on characteristic words so you can also try to add more sentences and have them contain some domain specific words.

@lucasvercelot
Copy link
Author

Alright, thank you very much for all these informations and for your help @adrienball ! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants