Cockatiel #149

fredericboisnard · 2023-11-27T18:39:24Z

First PR about the integration of Cockatiel in Xplique.

Signed-off-by: Frederic Boisnard <[email protected]>

This patch separates the methods dedicated to image processing/plotting into different sub classes, so that Cockatiel and NLP related classes will not have to inheritate these. Signed-off-by: Frederic Boisnard <[email protected]>

Currently by default the fit() method of Craft uses a hardcoded sampler. This patch allows Craft to use a sampler given in parameter. Signed-off-by: Frederic Boisnard <[email protected]>

Signed-off-by: Frederic Boisnard <[email protected]>

fredericboisnard · 2024-02-22T10:56:44Z

The conflicts must come from the 1st patch "test moving the EPSILON & similar to free some RAM" -> I will remove it from the PR since it was mostly to try to improve RAM management on my side (not sure if we want to take it ; and if yes, then it would be better in another PR I suppose)

AntoninPoche

I only read part of the PR for now. We will have to discuss the architecture of the code. It seem that cockatiel forces us to introduce nlp attributions.

AntoninPoche · 2024-03-19T13:40:06Z

xplique/attributions/__init__.py

@@ -16,4 +16,5 @@
 from .object_detector import BoundingBoxesExplainer
 from .global_sensitivity_analysis import SobolAttributionMethod, HsicAttributionMethod
 from .gradient_statistics import SmoothGrad, VarGrad, SquareGrad
+from .nlp_occlusion import NlpOcclusion


We should discuss where to place this, but in my view, it should not be in the same folder as the other attributions methods. My suggestions :

xplique.attributions.nlp.occlusion

xplique.nlp.attributions.occlusion

AntoninPoche · 2024-03-19T13:41:20Z

xplique/attributions/grad_cam_pp.py

@@ -40,10 +40,6 @@ class GradCAMPP(GradCAM):
        If a string is provided it will look for the layer name.
    """

-    # Avoid zero division during procedure. (the value is not important, as if the denominator is
-    # zero, then the nominator will also be zero).
-    EPSILON = tf.constant(1e-4)


Why was it necessary to move this?

AntoninPoche · 2024-03-19T13:43:38Z

xplique/attributions/nlp_occlusion.py

+
+    def explain(self,
+                sentence: str,
+                words: List[str],


I would allow an easy way not to specify words so that all the words are occluded, such as a None.

I think we should talk about this one also because it requires sentence-splitting, and there are many possibilities. Do we include punctuation? Do we split possessive parts? Do we allow the user to decide?

In my view, we should find a simple library that does this and refer to their API for possible parameters. However, this leads to the question, do we include such library in xplique initial requirements or do we make an additionnal nlp requirements?

AntoninPoche · 2024-03-19T14:05:59Z

xplique/attributions/nlp_occlusion.py

+        perturbated_words = NlpOcclusion._apply_masks(words, masks)
+
+        perturbated_sentences = [sentence]
+        perturbated_sentences.extend(


Here the perturbed sentences lack words from the initial sentence. Here you just concatenated the perturbed words. However, if I give the following inputs:

sentence = "My cat is in the garden." words = ["cat", "garden"] separator = " "

The perturbed_sentences value will be: `["My cat is in the garden.", "garden", "cat"]

I do not think this is what we expect.

AntoninPoche · 2024-03-19T14:07:37Z

xplique/attributions/nlp_occlusion.py

+    def explain(self,
+                sentence: str,
+                words: List[str],
+                separator: str) -> np.ndarray:


How do you do when the user provides sentences with punctuation? This may completely change the meaning of a sentence. I mean if you just concatenate words with space between them, the final sentence may not be readable.

AntoninPoche · 2024-03-19T16:05:02Z

xplique/commons/nlp.py

+            return WordExtractor()
+        if extract_fct == "excerpt":
+            return ExcerptExtractor()
+        raise ValueError("Extraction function can be only 'clause', \


You did not include "flaire_clause" and "spacy_clause" in the error message, but you left "clause" which is not a possibility.

AntoninPoche · 2024-03-19T16:07:34Z

xplique/commons/nlp.py

+    Factory for extractor classes.
+    """
+    @staticmethod
+    def get_extractor(extract_fct="sentence", tagger=None, pipeline=None, clause_type=None):


I think that language should be an argument of this function.

AntoninPoche · 2024-03-19T16:14:28Z

xplique/commons/operators.py

@@ -7,7 +7,7 @@
 import tensorflow as tf

 from ..types import Callable, Optional
-from ..utils_functions.object_detection import _box_iou, _format_objects, _EPSILON


I agree that the epsilon should not be loaded simultaneously as loading operators. But is it a good thing to initialize it at each run of the tf function?

AntoninPoche · 2024-03-19T16:34:14Z

xplique/commons/torch_operations.py

+    Parameters
+    ----------
+    tokenizer
+        The tokenizer function to be used for tokenization. It must be


This description needs to be clarified as no model is defined here. Does this correspond to what you define in the nlp_commons?

If it corresponds to the tokenizer-model pair, as in HuggingFace, then I recommend having a class that treats them both at the same time. It is easier to understand and the API would be more natural.

AntoninPoche · 2024-03-19T16:35:26Z

xplique/commons/torch_operations.py

+from ..types import Union, Tuple, Callable, List, Dict
+
+
+class NlpPreprocessor(ABC):


This class is called NLP... Should it not be in the nlp commons?

You wrote NlpProcessor is it not more natural to call it NLPProcessor as NLP is an acronym?

AntoninPoche

Great work Fred, this is a huge PR! A complex one at that, which raises several points where we should all agree before merging. I listed several general remarks:

No description of the PR in general, it can be great to clarify your choices and what you are trying to do.
No notebook or tutorials, I cannot review the plot function or the API without examples. Furthermore, it will be necessary for the final version.
I cannot launch the tests (from github), I think it is because of the conflicts.
You did not base your code on the last version of master (1.3.3). I think that the problems you have with memory were solved with last pull requests. It should solve the conflicts at the same time.
You did not make the documentation, it might be better to discuss the choice made before making it. It will save you some time.
You remade a lot of indentations (automatically, I think), but it created many weird code parts. The line limit is 100 in Xplique.
We will have to discuss two choices:
- Where do we put the nlp parts? In my view a xplique.npl module is pertinent
- The structure of the code (multi-inheritance)
The code you added introduced a lot of dependencies to other libraries (flair, nltk, transformers (for tests)); we should discuss how to treat this. Maybe have an additional nlp_requirements.txt file to install when calling xplique.nlp? Furthermore, we should check the licenses of these libraries, they should be MIT, otherwise we cannot use them.

AntoninPoche · 2024-03-20T10:04:07Z

xplique/concepts/cockatiel.py

+            self,
+            input_to_latent_model: Callable,
+            latent_to_logit_model: Callable,
+            preprocessor: NlpPreprocessor,


Here it seem that the user have to wrap his tokenizer using NlpPreprocessor.

Is it possible for the user to give his tokenizer and for us to wrap it ourselves? Otherwise, it adds a step for the user. I would prefer is the API is as simple as possible.

Yes I understand your concern ; it's possible, but one of the goals of the NlpPreprocessor is to gather both the tokenizer and its parameters in one place, in order to 'proxy' the calls to the tokenizer. Then we can use this feature in cockatiel but also in external parts such as nlp_batch_predict():

cockatiel._latent_predict() calls self.preprocessor.tokenize(), which will pass the good arguments to the tokenizer

in torch_operations.nlp_batch_predict(preprocessor): we use preprocessor.preprocess() which again calls preprocessor.tokenize() to pass the good arguments to the tokenizer

If we pass the tokenizer alone in cockatiel.init(), then we will have to provide a variable number of parameters in addition (tokenizer arguments, can vary fron one tokenizer to another), and duplicate these parameters in nlp_batch_predict() :/

Or perhaps there is a better / easier way to implements this ?

Maybe I should rename the NlpPreprocessor into something like TokenizerHelper to better show it's simply to encapsulate tokenizer calls ?

AntoninPoche · 2024-03-20T10:05:19Z

xplique/commons/torch_operations.py

+from ..types import Union, Tuple, Callable, List, Dict
+
+
+class NlpPreprocessor(ABC):


You wrote NlpProcessor is it not more natural to call it NLPProcessor as NLP is an acronym?

AntoninPoche · 2024-03-20T10:06:38Z

xplique/concepts/cockatiel.py

+        super().__init__(input_to_latent_model, latent_to_logit_model,
+                         number_of_concepts, batch_size, device=device)
+        self.preprocessor = preprocessor
+        self.patch_extractor = ExcerptExtractor()


You are forcing the patch_extractor here, should it not be an argument of the init function?

AntoninPoche · 2024-03-20T10:11:41Z

xplique/concepts/cockatiel.py

+        self.preprocessor = preprocessor
+        self.patch_extractor = ExcerptExtractor()
+
+    def _latent_predict(self, inputs: List[str], resize=None) -> torch.Tensor:


It seems that in this function you make really similar computations as in torch_operations.nlp_batch_predict. What is the reason for not using it or extracting the common part between the two?

AntoninPoche · 2024-03-20T10:25:31Z

xplique/concepts/cockatiel.py

+
+        return self._to_np_array(activations)
+
+    def _extract_patches(self, inputs: List[str]) -> Tuple[List[str], np.ndarray]:


The name is not clear enough because it also computes embeddings. I suggest using _crop_and_embed, _split_text_and _embed, or _from_inputs_to_latent_patches.

AntoninPoche · 2024-03-20T13:08:46Z

xplique/features_visualizations/preconditioning.py

Solved in previous pull requests, you should rebase on the last version of master.

AntoninPoche · 2024-03-20T13:09:06Z

xplique/utils_functions/object_detection.py

Same, previously solved.

AntoninPoche · 2024-03-20T13:45:16Z

tests/attributions/test_nlp_occlusion.py

+
+    assert np.array_equal(masks, expected_mask)
+
+def test_apply_masks():


I detected a problem with the Occlusion explain function. If you change the _apply_mask method so that it also takes the sentence as input and returns the perturbed_inputs, then you will be able to test it more precisely.

AntoninPoche · 2024-03-20T13:53:25Z

tests/concepts/test_cockatiel.py

+import torch.nn as nn
+
+from torch.nn import MSELoss
+from transformers import RobertaPreTrainedModel, RobertaModel


This creates quite a big dependence for xplique tests. The way to manage requirements for dev should also be discussed.

AntoninPoche · 2024-03-20T14:06:56Z

tests/concepts/test_cockatiel.py

+    nb_excerpts = 2
+    nb_most_important_concepts = 2
+
+    best_sentences_per_concept = cockatiel_explainer_pos.get_best_excerpts_per_concept(


How can you be sure of which sentence will correspond to which concept?

fredericboisnard added 6 commits February 5, 2024 17:20

test moving the EPSILON & similar to free some RAM

e357d23

Signed-off-by: Frederic Boisnard <[email protected]>

concepts: craft updates to prepare Cockatiel

186fc3e

This patch separates the methods dedicated to image processing/plotting into different sub classes, so that Cockatiel and NLP related classes will not have to inheritate these. Signed-off-by: Frederic Boisnard <[email protected]>

concepts: allow providing a different sampler to Craft

4dd731d

Currently by default the fit() method of Craft uses a hardcoded sampler. This patch allows Craft to use a sampler given in parameter. Signed-off-by: Frederic Boisnard <[email protected]>

concepts: add alpha_W parameter to Craft::fit()

993fc96

Signed-off-by: Frederic Boisnard <[email protected]>

concepts: update Craft code alignment

712a512

Signed-off-by: Frederic Boisnard <[email protected]>

concepts: improve Craft cmap management

0a02de4

Signed-off-by: Frederic Boisnard <[email protected]>

fredericboisnard force-pushed the cockatiel branch from f266d96 to 120052a Compare February 5, 2024 17:12

fredericboisnard added 5 commits February 14, 2024 20:41

nlp: add utility functions (tokenizers, preprocessors,..)

4d0ac4c

Signed-off-by: Frederic Boisnard <[email protected]>

concepts: create a basic nlp occlusion class for Cockatiel

9e669d3

Signed-off-by: Frederic Boisnard <[email protected]>

concepts: add the Cockatiel explainability method

6c39ffc

Signed-off-by: Frederic Boisnard <[email protected]>

concepts: add the Cockatiel Manager to handle several classes

75a880c

Signed-off-by: Frederic Boisnard <[email protected]>

tests: add unit tests for Cockatiel

3326a23

Signed-off-by: Frederic Boisnard <[email protected]>

fredericboisnard force-pushed the cockatiel branch from 120052a to 3326a23 Compare February 14, 2024 20:00

Agustin-Picard requested a review from AntoninPoche February 22, 2024 10:48

AntoninPoche requested changes Mar 19, 2024

View reviewed changes

AntoninPoche requested changes Mar 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cockatiel #149

Cockatiel #149

fredericboisnard commented Nov 27, 2023

fredericboisnard commented Feb 22, 2024

AntoninPoche left a comment

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 19, 2024

AntoninPoche Mar 20, 2024

AntoninPoche left a comment •

edited

Loading

AntoninPoche Mar 20, 2024

fredericboisnard Mar 21, 2024

AntoninPoche Mar 20, 2024

AntoninPoche Mar 20, 2024

AntoninPoche Mar 20, 2024

AntoninPoche Mar 20, 2024

AntoninPoche Mar 20, 2024

AntoninPoche Mar 20, 2024

AntoninPoche Mar 20, 2024

AntoninPoche Mar 20, 2024

AntoninPoche Mar 20, 2024

		from ..types import Union, Tuple, Callable, List, Dict


		class NlpPreprocessor(ABC):


		return self._to_np_array(activations)

		def _extract_patches(self, inputs: List[str]) -> Tuple[List[str], np.ndarray]:


		assert np.array_equal(masks, expected_mask)

		def test_apply_masks():

Cockatiel #149

Are you sure you want to change the base?

Cockatiel #149

Conversation

fredericboisnard commented Nov 27, 2023

fredericboisnard commented Feb 22, 2024

AntoninPoche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AntoninPoche left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AntoninPoche left a comment •

edited

Loading