[Intents] Handling question marks #629

lucasvercelot · 2018-07-18T12:35:31Z

Hi guys !

I'm running into a small problem today, with the question mark '?' character.
I noticed that when I fit my model's intent with a sentence composed of a question mark (at the end of the sentence), when I input the sentence with and without the question mark, I get different results and important probability score diff.

When trained with question mark :

{ input: 'es-tu réel',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 0.46640502699361525 } }

{ input: 'es-tu réel ?',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 1 } }

When trained without question mark :

{ input: 'es-tu réel',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 1 } }

{ input: 'es-tu réél ?',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 0.40673316662106906 } }

When trained with both sentences 'es-tu réel' and 'es-tu réel ?' :

{ input: 'es-tu réel ?',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 1 } }

{ input: 'es-tu réel',
  slots: [],
  intent: { intentName: 'whoAreYouFR', probability: 1 } }

Do you have any idea why it's doing this ? Is it normal ?

Thanks in advance ! :)

The text was updated successfully, but these errors were encountered:

adrienball · 2018-07-18T13:08:07Z

Hey @lucasvercelot ,
This is actually an expected behavior, even though it can be confusing. The SnipsNLUEngine contains two intent parsers that are called successively, the second one being called only when the first one doesn't find any match.
The first parser, that we called the DeterministicIntentParser, is based on pattern matching (regular expressions) and ensures a perfect accuracy on the training data. This parser will match with a probability of 1.0 any input which has been provided in the training data, and even small variations around it.
When the input hasn't been seen at training time, this parser will fail to find a match and then the second parser, the ProbabilisticIntentParser, will be called. This one is based on machine learning (logistic regression and Conditional Random Fields) and is able to extract intents and slots from inputs that can be very different from what was provided at train time.

In your example with the ?, all the cases where you get a classification probability of 1.0 correspond to inputs that were parsed by the DeterministicIntentParser because they were in the training data.

I guess the confusion comes from the probability gap for sentences that are very close. You can disable the first parser by removing it from the NLUEngineConfig used when instantiating the SnipsNLUEngine, so you'll have consistent probabilities. However, this might degrade a bit the parsing accuracy.

To have a better understanding of the whole pipeline, you can have a look at the blogpost we published when we open sourced the library.

I hope this helps :)

lucasvercelot · 2018-07-18T14:02:34Z

Hi @adrienball ,

I already read this article, which is great btw, a few months ago, and I completely forgot that the NLUEngine has two parsers, my bad ! :)

I'll try to disable the first parser and see what I got !

Thanks a lot, I'll come back to you, to let you know.

lucasvercelot · 2018-07-18T15:11:59Z

Ok, so I tried without the DeterministicIntentParser and now the probability scores between the sentence with ? and the one without it are much closer, but much lower aswell :

{"input": "es-tu réel ?", "slots": [], "intent": {"intentName": "whoAreYouFR", "probability": 0.5505541897507736}}

{"input": "es-tu réel", "slots": [], "intent": {"intentName": "whoAreYouFR", "probability": 0.38014287318577317}}

I was expecting a higher score.
Now, for some intents, trained with words of short sentences, it doesn't event get an intent at all.
I wonder if this is a normal behavior, or if I should change the way I fill the intents' sentences.

I don't know if it's best to create entities if I want to train some intents with juste words or small groups of words (2-3-4 words sentences). I noticed that event for bigger sentences the probability score is pretty low, lower than when both parsers are enabled.

adrienball · 2018-07-19T08:20:21Z

The probability of theProbabilisticIntentParser is not affected by the fact that the other parser is enabled or not.
The feedback we get about probabilities is that they are usually lower than what people expect. For now there is not much we can do about it if we stick with the current model (logistic regression) which performs quite well in terms of accuracy. In theory, you shouldn't have to worry about the absolute probabilities, as the intent which is returned is supposed to be the most likely one. However, comparing probabilities among the top intents could be useful but for now only the most likely intent is returned. See the thread about this: #623.

You shouldn't have to use entities if you don't need to, but that can be faster indeed to generate entity values with the intent vocabulary (if you know it exhaustively), instead of generating multiple sentences trying to cover all the vocabulary. So I guess that's something to try.

Intent classification relies on characteristic words so you can also try to add more sentences and have them contain some domain specific words.

lucasvercelot · 2018-07-19T08:26:23Z

Alright, thank you very much for all these informations and for your help @adrienball ! 😄

adrienball added the question label Jul 18, 2018

lucasvercelot closed this as completed Jul 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Intents] Handling question marks #629

[Intents] Handling question marks #629

lucasvercelot commented Jul 18, 2018

adrienball commented Jul 18, 2018

lucasvercelot commented Jul 18, 2018

lucasvercelot commented Jul 18, 2018

adrienball commented Jul 19, 2018

lucasvercelot commented Jul 19, 2018

[Intents] Handling question marks #629

[Intents] Handling question marks #629

Comments

lucasvercelot commented Jul 18, 2018

adrienball commented Jul 18, 2018

lucasvercelot commented Jul 18, 2018

lucasvercelot commented Jul 18, 2018

adrienball commented Jul 19, 2018

lucasvercelot commented Jul 19, 2018