merge Dev v1.1.0 - curriculum with yaml files

Dev v1.1.0
beyretb · Sep 16, 2019 · 2d5155d · 2d5155d
2 parents 6ebfa72 + 5fe2439
commit 2d5155d
Show file tree

Hide file tree

Showing 25 changed files with 398 additions and 57 deletions.
diff --git a/README.md b/README.md
@@ -172,6 +172,9 @@ features with the agent's frames in order to have frames in line with the config
 
 ## Version History
 
+- v1.1.0
+    - Add curriculum learning to `animalai-train` to use yaml configurations
+
 - v1.0.5
     - ~~Adds customisable resolution during evaluation~~ (removed, evaluation is only `84x84`)
     - Update `animalai-train` to tf 1.14 to fix `gin` broken dependency

diff --git a/animalai/setup.py b/animalai/setup.py
@@ -2,7 +2,7 @@
 
 setup(
     name='animalai',
-    version='1.0.5',
+    version='1.1.0',
     description='Animal AI competition interface',
     url='https://github.com/beyretb/AnimalAI-Olympics',
     author='Benjamin Beyret',

diff --git a/documentation/Curriculum/0.png b/documentation/Curriculum/0.png
diff --git a/documentation/Curriculum/1.png b/documentation/Curriculum/1.png
diff --git a/documentation/Curriculum/2.png b/documentation/Curriculum/2.png
diff --git a/documentation/Curriculum/3.png b/documentation/Curriculum/3.png
diff --git a/documentation/Curriculum/4.png b/documentation/Curriculum/4.png
diff --git a/documentation/Curriculum/5.png b/documentation/Curriculum/5.png
diff --git a/documentation/Curriculum/learning.png b/documentation/Curriculum/learning.png
diff --git a/documentation/Curriculum/lessons.png b/documentation/Curriculum/lessons.png
diff --git a/documentation/README.md b/documentation/README.md
@@ -5,6 +5,7 @@ You can find here the following documentation:
 - [The quickstart guide](quickstart.md)
 - [How to design configuration files](configFile.md)
 - [How training works](training.md)
+- [Add a curriculum to your training using animalai-train](curriculum.md)
 - [All the objects you can include in the arenas as well as their specifications](definitionsOfObjects.md)
 - [How to submit your agent](submission.md)
 - [A guide to train on AWS](cloudTraining.md)

diff --git a/documentation/curriculum.md b/documentation/curriculum.md
@@ -0,0 +1,95 @@
+# Curriculum Learning
+
+The `animalai-train` package contains a curriculum learning feature where you can specify a set of configuration files 
+which constitute lessons as part of the curriculum. See the 
+[ml-agents documentation](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Curriculum-Learning.md) 
+on curriculum learning for an overview of the technique. Our implementation is adapted from the ml-agents one, to use 
+configuration files rather than environment parameters (which don't exist in `animalai`).
+
+## Meta Curriculum
+
+To define a curriculum you will need to provide the following:
+
+- lessons (or levels), generally of increasing difficulty, that your agent will learn on, switching from easy to more difficult 
+- a metric you want to monitor to switch from one level to the next
+- the value for each of these thresholds
+
+In practice, you will place these parameters in a `json` file named after the brain in the environment (`Learner.json` in 
+our case), and place this file in a folder with all the configuration files you wish to use. This constitutes what we call 
+a meta-curriculum.
+
+## Example
+
+An example is provided in [the example folder](../examples/configs/curriculum). The idea of this curriculum is to train 
+an agent to navigate a maze by creating maze like structures of perpendicular walls, starting with a single wall and food, 
+adding one more wall at each level. Below are samples from the 6 different levels.
+
+
+
+![](Curriculum/0.png) |![](Curriculum/1.png)|![](Curriculum/2.png)|
+:--------------------:|:-------------------:|:-------------------:
+![](Curriculum/3.png) |![](Curriculum/4.png)|![](Curriculum/5.png)|
+
+To produce such a curriculum, we define the meta-curriculum in the following `json` format:
+
+```
+{
+  "measure": "reward",
+  "thresholds": [
+    1.5,
+    1.4,
+    1.3,
+    1.2,
+    1.1
+  ],
+  "min_lesson_length": 100,
+  "signal_smoothing": true,
+  "configuration_files": [
+    "0.yaml",
+    "1.yaml",
+    "2.yaml",
+    "3.yaml",
+    "4.yaml",
+    "5.yaml"
+  ]
+}
+```
+
+All parameters are the same as in [ml-agents](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Curriculum-Learning.md), 
+except for the `configuration_files`. From the ml-agents documentation:
+
+* `measure` - What to measure learning progress, and advancement in lessons by.
+  * `reward` - Uses a measure received reward.
+  * `progress` - Uses ratio of steps/max_steps.
+* `thresholds` (float array) - Points in value of `measure` where lesson should
+  be increased.
+* `min_lesson_length` (int) - The minimum number of episodes that should be
+  completed before the lesson can change. If `measure` is set to `reward`, the
+  average cumulative reward of the last `min_lesson_length` episodes will be
+  used to determine if the lesson should change. Must be nonnegative.
+
+  __Important__: the average reward that is compared to the thresholds is
+  different than the mean reward that is logged to the console. For example,
+  if `min_lesson_length` is `100`, the lesson will increment after the average
+  cumulative reward of the last `100` episodes exceeds the current threshold.
+  The mean reward logged to the console is dictated by the `summary_freq`
+  parameter in the
+  [trainer configuration file](../examples/configs/trainer_config.yaml).
+* `signal_smoothing` (true/false) - Whether to weight the current progress
+  measure by previous values.
+  * If `true`, weighting will be 0.75 (new) 0.25 (old).
+
+ The `configuration_files` parameter is simply a list of files names which contain the lessons in the order they should be loaded.
+ Note that if you have `n` lessons, you need to define `n-1` thresholds. 
+
+
+ ## Training
+
+ Once the folder created, training is done in the same way as before but now we pass a `MetaCurriculum` object to the 
+ `meta_curriculum` argument of a `TrainerController`.
+
+ We provide an example using the above curriculum in [examples/trainCurriculum.py](../examples/trainCurriculum.py).
+ Training this agent, you can see the lessons switch using tensorboard:
+
+ ![](Curriculum/learning.png)
+ ![](Curriculum/lessons.png)
diff --git a/examples/animalai_train/animalai_train/trainers/curriculum.py b/examples/animalai_train/animalai_train/trainers/curriculum.py
@@ -3,17 +3,19 @@
 import math
 
 from .exception import CurriculumError
+from animalai.envs.arena_config import ArenaConfig
 
 import logging
 
 logger = logging.getLogger('mlagents.trainers')
 
 
 class Curriculum(object):
-    def __init__(self, location):
+    def __init__(self, location, yaml_files):
         """
         Initializes a Curriculum object.
         :param location: Path to JSON defining curriculum.
+        :param yaml_files: A list of configuration files for each lesson
         """
         self.max_lesson_num = 0
         self.measure = None
@@ -32,7 +34,7 @@ def __init__(self, location):
             raise CurriculumError('There was an error decoding {}'
                                   .format(location))
         self.smoothing_value = 0
-        for key in ['parameters', 'measure', 'thresholds',
+        for key in ['configuration_files', 'measure', 'thresholds',
                     'min_lesson_length', 'signal_smoothing']:
             if key not in self.data:
                 raise CurriculumError("{0} does not contain a "
@@ -43,18 +45,25 @@ def __init__(self, location):
         self.min_lesson_length = self.data['min_lesson_length']
         self.max_lesson_num = len(self.data['thresholds'])
 
-        parameters = self.data['parameters']
-        for key in parameters:
-            # if key not in default_reset_parameters:
-            #     raise CurriculumError(
-            #         'The parameter {0} in Curriculum {1} is not present in '
-            #         'the Environment'.format(key, location))
-            if len(parameters[key]) != self.max_lesson_num + 1:
-                raise CurriculumError(
-                    'The parameter {0} in Curriculum {1} must have {2} values '
-                    'but {3} were found'.format(key, location,
-                                                self.max_lesson_num + 1,
-                                                len(parameters[key])))
+        configuration_files = self.data['configuration_files']
+        # for key in configuration_files:
+        # if key not in default_reset_parameters:
+        #     raise CurriculumError(
+        #         'The parameter {0} in Curriculum {1} is not present in '
+        #         'the Environment'.format(key, location))
+        if len(configuration_files) != self.max_lesson_num + 1:
+            raise CurriculumError(
+                'The parameter {0} in Curriculum {1} must have {2} values '
+                'but {3} were found'.format(key, location,
+                                            self.max_lesson_num + 1,
+                                            len(configuration_files)))
+        folder = os.path.dirname(location)
+        folder_yaml_files = os.listdir(folder)
+        if not all([file in folder_yaml_files for file in configuration_files]):
+            raise Curriculum(
+                'One or more configuration file(s) in curriculum {0} could not be found'.format(location)
+            )
+        self.configurations = [ArenaConfig(os.path.join(folder, file)) for file in yaml_files]
 
     @property
     def lesson_num(self):
@@ -79,15 +88,13 @@ def increment_lesson(self, measure_val):
         if self.lesson_num < self.max_lesson_num:
             if measure_val > self.data['thresholds'][self.lesson_num]:
                 self.lesson_num += 1
-                config = {}
-                parameters = self.data['parameters']
-                for key in parameters:
-                    config[key] = parameters[key][self.lesson_num]
-                logger.info('{0} lesson changed. Now in lesson {1}: {2}'
+                # config = {}
+                # parameters = self.data['parameters']
+                # for key in parameters:
+                #     config[key] = parameters[key][self.lesson_num]
+                logger.info('{0} lesson changed. Now in lesson {1}'
                             .format(self._brain_name,
-                                    self.lesson_num,
-                                    ', '.join([str(x) + ' -> ' + str(config[x])
-                                        for x in config])))
+                                    self.lesson_num))
                 return True
         return False
 
@@ -103,8 +110,8 @@ def get_config(self, lesson=None):
         if lesson is None:
             lesson = self.lesson_num
         lesson = max(0, min(lesson, self.max_lesson_num))
-        config = {}
-        parameters = self.data['parameters']
-        for key in parameters:
-            config[key] = parameters[key][lesson]
+        config = self.configurations[lesson]
+        # parameters = self.data['parameters']
+        # for key in parameters:
+        #     config[key] = parameters[key][lesson]
         return config
diff --git a/examples/animalai_train/animalai_train/trainers/meta_curriculum.py b/examples/animalai_train/animalai_train/trainers/meta_curriculum.py
@@ -20,34 +20,41 @@ def __init__(self, curriculum_folder):
         Args:
             curriculum_folder (str): The relative or absolute path of the
                 folder which holds the curriculums for this environment.
-                The folder should contain JSON files whose names are the
-                brains that the curriculums belong to.
+                The folder should contain one JSON file which name is the
+                same as the brains in the academy (e.g Learner) and contains
+                the parameters for the curriculum as well as all the YAML
+                files for each curriculum lesson
         """
-        used_reset_parameters = set()
+        # used_reset_parameters = set()
         self._brains_to_curriculums = {}
+        self._configuration_files = []
 
         try:
-            for curriculum_filename in os.listdir(curriculum_folder):
+            json_files = [file for file in os.listdir(curriculum_folder) if '.json' in file.lower()]
+            yaml_files = [file for file in os.listdir(curriculum_folder) if
+                          ('.yaml' in file.lower() or '.yml' in file.lower())]
+            for curriculum_filename in json_files:
                 brain_name = curriculum_filename.split('.')[0]
                 curriculum_filepath = \
                     os.path.join(curriculum_folder, curriculum_filename)
-                curriculum = Curriculum(curriculum_filepath)
+                curriculum = Curriculum(curriculum_filepath, yaml_files)
 
+                # ===== TO REMOVE ??? ===========
                 # Check if any two curriculums use the same reset params.
-                if any([(parameter in curriculum.get_config().keys())
-                    for parameter in used_reset_parameters]):
-                    logger.warning('Two or more curriculums will '
-                                'attempt to change the same reset '
-                                'parameter. The result will be '
-                                'non-deterministic.')
-
-                used_reset_parameters.update(curriculum.get_config().keys())
+                # if any([(parameter in curriculum.get_config().keys())
+                #         for parameter in used_reset_parameters]):
+                #     logger.warning('Two or more curriculums will '
+                #                    'attempt to change the same reset '
+                #                    'parameter. The result will be '
+                #                    'non-deterministic.')
+                #
+                # used_reset_parameters.update(curriculum.get_config().keys())
+                # ===== end of to remove =========
                 self._brains_to_curriculums[brain_name] = curriculum
         except NotADirectoryError:
             raise MetaCurriculumError(curriculum_folder + ' is not a '
-                                      'directory. Refer to the ML-Agents '
-                                      'curriculum learning docs.')
-
+                                                          'directory. Refer to the ML-Agents '
+                                                          'curriculum learning docs.')
 
     @property
     def brains_to_curriculums(self):
@@ -83,7 +90,7 @@ def _lesson_ready_to_increment(self, brain_name, reward_buff_size):
             increment its lesson.
         """
         return reward_buff_size >= (self.brains_to_curriculums[brain_name]
-                                        .min_lesson_length)
+                                    .min_lesson_length)
 
     def increment_lessons(self, measure_vals, reward_buff_sizes=None):
         """Attempts to increments all the lessons of all the curriculums in this
@@ -108,14 +115,13 @@ def increment_lessons(self, measure_vals, reward_buff_sizes=None):
                 if self._lesson_ready_to_increment(brain_name, buff_size):
                     measure_val = measure_vals[brain_name]
                     ret[brain_name] = (self.brains_to_curriculums[brain_name]
-                                           .increment_lesson(measure_val))
+                                       .increment_lesson(measure_val))
         else:
             for brain_name, measure_val in measure_vals.items():
                 ret[brain_name] = (self.brains_to_curriculums[brain_name]
-                                       .increment_lesson(measure_val))
+                                   .increment_lesson(measure_val))
         return ret
 
-
     def set_all_curriculums_to_lesson_num(self, lesson_num):
         """Sets all the curriculums in this meta curriculum to a specified
         lesson number.
@@ -127,18 +133,17 @@ def set_all_curriculums_to_lesson_num(self, lesson_num):
         for _, curriculum in self.brains_to_curriculums.items():
             curriculum.lesson_num = lesson_num
 
-
     def get_config(self):
         """Get the combined configuration of all curriculums in this
         MetaCurriculum.
 
         Returns:
             A dict from parameter to value.
         """
-        config = {}
+        # config = {}
 
         for _, curriculum in self.brains_to_curriculums.items():
             curr_config = curriculum.get_config()
-            config.update(curr_config)
+            # config.update(curr_config)
 
-        return config
+        return curr_config
diff --git a/examples/animalai_train/animalai_train/trainers/trainer_controller.py b/examples/animalai_train/animalai_train/trainers/trainer_controller.py
@@ -180,11 +180,11 @@ def _reset_env(self, env):
             environment.
         """
         if self.meta_curriculum is not None:
-            return env.reset(config=self.meta_curriculum.get_config())
+            return env.reset(arenas_configurations=self.meta_curriculum.get_config())
         else:
             if self.update_config:
-                return env.reset(arenas_configurations=self.config)
                 self.update_config = False
+                return env.reset(arenas_configurations=self.config)
             else:
                 return env.reset()
 

diff --git a/examples/animalai_train/setup.py b/examples/animalai_train/setup.py
@@ -2,7 +2,7 @@
 
 setup(
     name='animalai_train',
-    version='1.0.5',
+    version='1.1.0',
     description='Animal AI competition training library',
     url='https://github.com/beyretb/AnimalAI-Olympics',
     author='Benjamin Beyret',

diff --git a/examples/configs/curriculum/0.yaml b/examples/configs/curriculum/0.yaml
@@ -0,0 +1,23 @@
+!ArenaConfig
+arenas:
+  0: !Arena
+    t: 250
+    items:
+    - !Item
+      name: Wall
+      positions:
+      - !Vector3 {x: -1, y: 0, z: 10}
+      colors:
+      rotations: [90]
+      sizes:
+      - !Vector3 {x: 1, y: 5, z: 9}
+    - !Item
+      name: GoodGoal
+      positions:
+        - !Vector3 {x: -1, y: 0, z: 35}
+      sizes:
+        - !Vector3 {x: 2, y: 2, z: 2}
+    - !Item
+      name: Agent
+      positions:
+        - !Vector3 {x: -1, y: 1, z: 5}