diff --git a/.gitignore b/.gitignore index dfd1107a..a1fc126d 100644 --- a/.gitignore +++ b/.gitignore @@ -1,9 +1,15 @@ env/* !env/README.md -/models -/summaries +models/ +summaries/ /.idea __pycache__/ UnitySDK.log -/venv -/dev \ No newline at end of file +venv/ +*/venv +/dev +build/ +dist/ +*.egg-info* +*.eggs* +animalai_bkp/ \ No newline at end of file diff --git a/README.md b/README.md index 7c4110d0..c8f42117 100644 --- a/README.md +++ b/README.md @@ -3,12 +3,23 @@ ## Overview Welcome to the repository for the Animal-AI Olympics competition where you will find all the code needed to compete in -this new challenge. Note that for the moment this repo contains **only the training environment** (v0.1) that will be used for the competition and **does not contain any competition tests or information for entering**. If everything goes well the competition will be live on June 30th. Until then we will be continually updating with bug fixes and small changes to environment. However, the general structure will stay the same so it's not too early to start working with the environment. For more information on the competition itself and to stay updated with any developments, head to the [Competition Website](http://www.animalaiolympics.com/) and follow [@MacroPhilosophy](https://twitter.com/MacroPhilosophy) and [@BenBeyret](https://twitter.com/BenBeyret) on twitter. - -The environment contains an agent enclosed in a fixed sized arena. Objects can spawn in this arena, including positive and negative rewards (green, yellow and red spheres). All of the hidden tests that will appear in the competition are made using the objects in the training environment. We have provided some sample environment configurations that should be useful for training, but part of the challenge will be experimenting and designing new configurations. - -The goal of this first release is to **seek feedback from the community** as well as to provide the environment for research prior to the launch of the competition itself. The competition version of the environment will be similar to this one, however we are open to suggestion (for minor changes) and especially bug reports! Head over to the [issues page](https://github.com/beyretb/AnimalAI-Olympics/issues) and open a ticket using the `suggestion` or `bug` labels -respectively. +this new challenge. Note that for the moment this repo contains **only the training environment** (v0.5) that will be +used for the competition and **does not contain any competition tests or information for entering**. If everything goes +well the competition will be live on July 1st. Until then we will be continually updating with bug fixes and small +changes to the environment. However, the general structure will stay the same so it's not too early to start working with the environment. For more information on the competition itself and to stay updated with any developments, head to the +[Competition Website](http://www.animalaiolympics.com/) and follow [@MacroPhilosophy](https://twitter.com/MacroPhilosophy) +and [@BenBeyret](https://twitter.com/BenBeyret) on twitter. + +The environment contains an agent enclosed in a fixed sized arena. Objects can spawn in this arena, including positive +and negative rewards (green, yellow and red spheres). All of the hidden tests that will appear in the competition are +made using the objects in the training environment. We have provided some sample environment configurations that should +be useful for training, but part of the challenge will be experimenting and designing new configurations. + +The goal of this first release is to **seek feedback from the community** as well as to provide the environment for +research prior to the launch of the competition itself. The competition version of the environment will be similar to +this one, however we are open to suggestion (for minor changes) and especially bug reports! Head over to the +[issues page](https://github.com/beyretb/AnimalAI-Olympics/issues) and open a ticket using the `suggestion` or `bug` +labels respectively. To get started install the requirements below, and then follow the [Quick Start Guide](documentation/quickstart.md). A more in depth documentation can be found on the @@ -16,37 +27,51 @@ A more in depth documentation c ## Development Blog -You can read the development blog [here](https://mdcrosby.com/blog). It covers further details about the competition as well as part of the development process. +You can read the development blog [here](https://mdcrosby.com/blog). It covers further details about the competition as +well as part of the development process. 1. [Why Animal-AI?](https://mdcrosby.com/blog/animalai1.html) 2. [The Syllabus (Part 1)](https://mdcrosby.com/blog/animalai2.html) -## Requirements - -The Animal-AI package works on most platforms. - -First of all your will need `python3.6` installed. You will find a list of requirements in the `requirements*.txt` files. -Using `pip` you can run: +3. [The Syllabus (Part 2): Lights Out](https://mdcrosby.com/blog/animalai3.html) -on Linux and mac: -``` -pip install -r requirementsOthers.txt -``` +## Requirements -on windows: -``` -pip install -r requirementsWindows.txt -``` -**Note:** `python3.6` is required to install `tensorflow>=1.7,<1.8` which is only used for the training script we provide as an example. Should you wish to use another version of python you can remove the first line from the requirement files. You will still be able to use the `visualizeArena.py` script, but not the `train.py` one. +The Animal-AI package works on Linux, Mac and Windows, as well as most Cloud providers. + + +First of all your will need `python3.6` installed. We recommend using a virtual environment specifically for the competition. We provide two packages for +this competition: + +- The main one is an API for interfacing with the Unity environment. It contains both a +[gym environment](https://github.com/openai/gym) as well as an extension of Unity's +[ml-agents environments](https://github.com/Unity-Technologies/ml-agents/tree/master/ml-agents-envs). You can install it + via pip: + ``` + pip install animalai + ``` + Or you can install it from the source, head to `animalai/` folder and run `pip install -e .` + +- We also provide a package that can be used as a starting point for training, and which is required to run most of the +example scripts found in the `examples/` folder. It contains an extension of +[ml-agents' training environment](https://github.com/Unity-Technologies/ml-agents/tree/master/ml-agents) that relies on +[OpenAI's PPO](https://openai.com/blog/openai-baselines-ppo/), as well as +[Google's dopamine](https://github.com/google/dopamine) which implements +[Rainbow](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/17204/16680) (among others). You can also install +this package using pip: + ``` + pip install animalai-train + ``` + Or you can install it from source, head to `examples/animalai_train` and run `pip install -e .` Finally download the environment for your system: | OS | Environment link | | --- | --- | -| Linux | [download v0.4](https://www.doc.ic.ac.uk/~bb1010/animalAI/env_linux_v0.4.zip) | -| MacOS | [download v0.4](https://www.doc.ic.ac.uk/~bb1010/animalAI/env_mac_v0.4.zip) | -| Windows | [download v0.4](https://www.doc.ic.ac.uk/~bb1010/animalAI/env_windows_v0.4.zip) | +| Linux | [download v0.5](https://www.doc.ic.ac.uk/~bb1010/animalAI/env_linux_v0.5.zip) | +| MacOS | [download v0.5](https://www.doc.ic.ac.uk/~bb1010/animalAI/env_mac_v0.5.zip) | +| Windows | [download v0.5](https://www.doc.ic.ac.uk/~bb1010/animalAI/env_windows_v0.5.zip) | You can now unzip the content of the archive to the `env` folder and you're ready to go! Make sure the executable `AnimalAI.*` is in `env/`. On linux you may have to make the file executable by running `chmod +x env/AnimalAI.x86_64`. @@ -66,39 +91,43 @@ mode. Here you can control the agent with the following: | C | switch camera | | R | reset environment | -**Note**: on some platforms, playing manually in full screen makes the environment slow, keep the environment in window -mode for better performance. - ## Competition Tests -We will be releasing further details about the tests in the competition over the coming weeks. The tests will be split into multiple categories from the very simple (e.g. **food retrieval**, **preferences**, and **basic obstacles**) to the more complex (e.g. **working memory**, **spatial memory**, **object permanence**, and **object manipulation**). For now we have included multiple example config files that each relate to a different category. As we release further details we will also specify the rules for the type of tests that can appear in each category. Note that the example config files are just simple examples to be used as a guide. An agent that solves even all of these perfectly may still not be able to solve all the tests in the categories but it would be off to a very good start. +We will be releasing further details about the tests in the competition over the coming weeks. The tests will be split +into multiple categories from the very simple (e.g. **food retrieval**, **preferences**, and **basic obstacles**) to +the more complex (e.g. **working memory**, **spatial memory**, **object permanence**, and **object manipulation**). For +now we have included multiple example config files that each relate to a different category. As we release further +details we will also specify the rules for the type of tests that can appear in each category. Note that the example +config files are just simple examples to be used as a guide. An agent that solves even all of these perfectly may still +not be able to solve all the tests in the categories but it would be off to a very good start. ## Citing -For now please cite the [Nature: Machine Intelligence piece](https://rdcu.be/bBCQt): +For now please cite the [Nature: Machine Intelligence piece](https://rdcu.be/bBCQt) for any work involving the competition environment: -Crosby, M., Beyret, B., Halina M. [The Animal-AI Olympics](https://www.nature.com/articles/s42256-019-0050-3) Nature Machine Intelligence 1 (5) p257 2019. +Crosby, M., Beyret, B., Halina M. [The Animal-AI Olympics](https://www.nature.com/articles/s42256-019-0050-3) Nature +Machine Intelligence 1 (5) p257 2019. ## Unity ML-Agents The Animal-AI Olympics was built using [Unity's ML-Agents Toolkit.](https://github.com/Unity-Technologies/ml-agents) The Python library located in [animalai](animalai) is almost identical to -[ml-agents v0.7](https://github.com/Unity-Technologies/ml-agents/tree/master/ml-agents-envs). We only added the possibility to change the configuration of arenas between episodes. The documentation for ML-Agents can be found [here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Python-API.md). +[ml-agents v0.7](https://github.com/Unity-Technologies/ml-agents/tree/master/ml-agents-envs). We only added the +possibility to change the configuration of arenas between episodes. The documentation for ML-Agents can be found +[here](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Python-API.md). Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., Lange, D. (2018). [Unity: A General Platform for Intelligent Agents.](https://arxiv.org/abs/1809.02627) *arXiv preprint arXiv:1809.02627* ## Known Bugs -Occasionally will spawn an empty arena in play mode. Temporary fix: just press R to respawn. - -Occasional slow frame rates in play mode. Temporary fix: reduce screen size. +... ## TODO -- [ ] Offer a gym wrapper for training - [ ] Add protobuf for arena spawning feedback +- [x] Offer a gym wrapper for training - [x] Improve the way the agent spawns - [x] Add lights out configurations. - [x] Improve environment framerates @@ -106,17 +135,29 @@ Occasional slow frame rates in play mode. Temporary fix: reduce screen size. ## Version History +- v0.5 Package `animalai`, gym compatible, dopamine example, bug fixes + - Separate environment API and training API in Python + - Release both as `animalai` and `animalai-train` PyPI packages (for `pip` installs) + - Agent speed in play-mode constant across various platforms + - Provide Gym environment + - Add `trainBaselines,py` to train using `dopamine` and the Gym wrapper + - Create the `agent.py` interface for agents submission + - Add the `HotZone` object (equivalent to the red zone but without death) + - v0.4 - Lights off moved to Unity, colors configurations, proportional goals, bugs fixes - The light is now directly switched on/off within Unity, configuration files stay the same - Blackouts now work with infinite episodes (`t=0`) - - The `rand_colors` configurations have been removed and the user can now pass `RGB` values, see [here](documentation/configFile.md#objects) - - Rewards for goals are now proportional to their size (except for the `DeathZone`), see [here](documentation/definitionsOfObjects.md#rewards) + - The `rand_colors` configurations have been removed and the user can now pass `RGB` values, see + [here](documentation/configFile.md#objects) + - Rewards for goals are now proportional to their size (except for the `DeathZone`), see + [here](documentation/definitionsOfObjects.md#rewards) - The agent is now a ball rather than a cube - Increased safety for spawning the agent to avoid infinite loops - Bugs fixes - v0.3 - Lights off, remove Beams and add cylinder - - We added the possibility to switch the lights off at given intervals, see [here](documentation/configFile.md#blackouts) + - We added the possibility to switch the lights off at given intervals, see + [here](documentation/configFile.md#blackouts) - visualizeLightsOff.py displays an example of lights off, from the agent's point of view - Beams objects have been removed - A `Cylinder` object has been added (similar behaviour to the `Woodlog`) diff --git a/animalai/LICENSE b/animalai/LICENSE new file mode 100644 index 00000000..7ff5035e --- /dev/null +++ b/animalai/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "{}" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright 2017 Unity Technologies + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. \ No newline at end of file diff --git a/animalai/README.md b/animalai/README.md new file mode 100644 index 00000000..9df4cf5a --- /dev/null +++ b/animalai/README.md @@ -0,0 +1,14 @@ +# AnimalAI Python API + +This package provides the Python API used for training agents for the Animal AI Olympics competition. It is mostly an +extension of [Unity's MLAgents env](https://github.com/Unity-Technologies/ml-agents/tree/master/ml-agents-envs). + +It contains two ways of interfacing with the Unity environments: + +- `animalai.envs.environment` contains the `UnityEnvironment` which is similar to the one found in `mlagents` but with +a few adaptations to allow for more custom communications between Python and Unity. + +- `animalai.envs.gym.environment` contains the `AnimalAIEnv` which provides a gym environment to use directly with +baselines. + +For more details and documentation have a look at the [AnimalAI documentation](../documentation) \ No newline at end of file diff --git a/animalai/__init__.py b/animalai/__init__.py deleted file mode 100644 index e69de29b..00000000 diff --git a/animalai/animalai/__init__.py b/animalai/animalai/__init__.py new file mode 100644 index 00000000..c088da9f --- /dev/null +++ b/animalai/animalai/__init__.py @@ -0,0 +1 @@ +name= "animalai" diff --git a/animalai/communicator_objects/__init__.py b/animalai/animalai/communicator_objects/__init__.py similarity index 100% rename from animalai/communicator_objects/__init__.py rename to animalai/animalai/communicator_objects/__init__.py diff --git a/animalai/communicator_objects/agent_action_proto_pb2.py b/animalai/animalai/communicator_objects/agent_action_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/agent_action_proto_pb2.py rename to animalai/animalai/communicator_objects/agent_action_proto_pb2.py diff --git a/animalai/communicator_objects/agent_info_proto_pb2.py b/animalai/animalai/communicator_objects/agent_info_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/agent_info_proto_pb2.py rename to animalai/animalai/communicator_objects/agent_info_proto_pb2.py diff --git a/animalai/communicator_objects/arena_parameters_proto_pb2.py b/animalai/animalai/communicator_objects/arena_parameters_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/arena_parameters_proto_pb2.py rename to animalai/animalai/communicator_objects/arena_parameters_proto_pb2.py diff --git a/animalai/communicator_objects/brain_parameters_proto_pb2.py b/animalai/animalai/communicator_objects/brain_parameters_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/brain_parameters_proto_pb2.py rename to animalai/animalai/communicator_objects/brain_parameters_proto_pb2.py diff --git a/animalai/communicator_objects/command_proto_pb2.py b/animalai/animalai/communicator_objects/command_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/command_proto_pb2.py rename to animalai/animalai/communicator_objects/command_proto_pb2.py diff --git a/animalai/communicator_objects/demonstration_meta_proto_pb2.py b/animalai/animalai/communicator_objects/demonstration_meta_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/demonstration_meta_proto_pb2.py rename to animalai/animalai/communicator_objects/demonstration_meta_proto_pb2.py diff --git a/animalai/communicator_objects/engine_configuration_proto_pb2.py b/animalai/animalai/communicator_objects/engine_configuration_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/engine_configuration_proto_pb2.py rename to animalai/animalai/communicator_objects/engine_configuration_proto_pb2.py diff --git a/animalai/communicator_objects/header_pb2.py b/animalai/animalai/communicator_objects/header_pb2.py similarity index 100% rename from animalai/communicator_objects/header_pb2.py rename to animalai/animalai/communicator_objects/header_pb2.py diff --git a/animalai/communicator_objects/resolution_proto_pb2.py b/animalai/animalai/communicator_objects/resolution_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/resolution_proto_pb2.py rename to animalai/animalai/communicator_objects/resolution_proto_pb2.py diff --git a/animalai/communicator_objects/space_type_proto_pb2.py b/animalai/animalai/communicator_objects/space_type_proto_pb2.py similarity index 100% rename from animalai/communicator_objects/space_type_proto_pb2.py rename to animalai/animalai/communicator_objects/space_type_proto_pb2.py diff --git a/animalai/communicator_objects/unity_input_pb2.py b/animalai/animalai/communicator_objects/unity_input_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_input_pb2.py rename to animalai/animalai/communicator_objects/unity_input_pb2.py diff --git a/animalai/communicator_objects/unity_message_pb2.py b/animalai/animalai/communicator_objects/unity_message_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_message_pb2.py rename to animalai/animalai/communicator_objects/unity_message_pb2.py diff --git a/animalai/communicator_objects/unity_output_pb2.py b/animalai/animalai/communicator_objects/unity_output_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_output_pb2.py rename to animalai/animalai/communicator_objects/unity_output_pb2.py diff --git a/animalai/communicator_objects/unity_rl_initialization_input_pb2.py b/animalai/animalai/communicator_objects/unity_rl_initialization_input_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_rl_initialization_input_pb2.py rename to animalai/animalai/communicator_objects/unity_rl_initialization_input_pb2.py diff --git a/animalai/communicator_objects/unity_rl_initialization_output_pb2.py b/animalai/animalai/communicator_objects/unity_rl_initialization_output_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_rl_initialization_output_pb2.py rename to animalai/animalai/communicator_objects/unity_rl_initialization_output_pb2.py diff --git a/animalai/communicator_objects/unity_rl_input_pb2.py b/animalai/animalai/communicator_objects/unity_rl_input_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_rl_input_pb2.py rename to animalai/animalai/communicator_objects/unity_rl_input_pb2.py diff --git a/animalai/communicator_objects/unity_rl_output_pb2.py b/animalai/animalai/communicator_objects/unity_rl_output_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_rl_output_pb2.py rename to animalai/animalai/communicator_objects/unity_rl_output_pb2.py diff --git a/animalai/communicator_objects/unity_rl_reset_input_pb2.py b/animalai/animalai/communicator_objects/unity_rl_reset_input_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_rl_reset_input_pb2.py rename to animalai/animalai/communicator_objects/unity_rl_reset_input_pb2.py diff --git a/animalai/communicator_objects/unity_rl_reset_output_pb2.py b/animalai/animalai/communicator_objects/unity_rl_reset_output_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_rl_reset_output_pb2.py rename to animalai/animalai/communicator_objects/unity_rl_reset_output_pb2.py diff --git a/animalai/communicator_objects/unity_to_external_pb2.py b/animalai/animalai/communicator_objects/unity_to_external_pb2.py similarity index 100% rename from animalai/communicator_objects/unity_to_external_pb2.py rename to animalai/animalai/communicator_objects/unity_to_external_pb2.py diff --git a/animalai/communicator_objects/unity_to_external_pb2_grpc.py b/animalai/animalai/communicator_objects/unity_to_external_pb2_grpc.py similarity index 100% rename from animalai/communicator_objects/unity_to_external_pb2_grpc.py rename to animalai/animalai/communicator_objects/unity_to_external_pb2_grpc.py diff --git a/animalai/envs/__init__.py b/animalai/animalai/envs/__init__.py similarity index 100% rename from animalai/envs/__init__.py rename to animalai/animalai/envs/__init__.py diff --git a/animalai/envs/arena_config.py b/animalai/animalai/envs/arena_config.py similarity index 94% rename from animalai/envs/arena_config.py rename to animalai/animalai/envs/arena_config.py index 8e48b1a0..a756c493 100644 --- a/animalai/envs/arena_config.py +++ b/animalai/animalai/envs/arena_config.py @@ -94,11 +94,11 @@ def dict_to_arena_config(self) -> UnityRLResetInput: return config_out - def update(self, arenas_configurations_input): + def update(self, arenas_configurations): - if arenas_configurations_input is not None: - for arena_i in arenas_configurations_input.arenas: - self.arenas[arena_i] = copy.copy(arenas_configurations_input.arenas[arena_i]) + if arenas_configurations is not None: + for arena_i in arenas_configurations.arenas: + self.arenas[arena_i] = copy.copy(arenas_configurations.arenas[arena_i]) def constructor_arena(loader, node): diff --git a/animalai/envs/brain.py b/animalai/animalai/envs/brain.py similarity index 100% rename from animalai/envs/brain.py rename to animalai/animalai/envs/brain.py diff --git a/animalai/envs/communicator.py b/animalai/animalai/envs/communicator.py similarity index 100% rename from animalai/envs/communicator.py rename to animalai/animalai/envs/communicator.py diff --git a/animalai/envs/environment.py b/animalai/animalai/envs/environment.py similarity index 97% rename from animalai/envs/environment.py rename to animalai/animalai/envs/environment.py index 6c7fc0ae..5e76c159 100644 --- a/animalai/envs/environment.py +++ b/animalai/animalai/envs/environment.py @@ -53,7 +53,8 @@ def __init__(self, file_name=None, self._loaded = False # If true, this means the environment was successfully loaded self.proc1 = None # The process that is started. If None, no process was started self.communicator = self.get_communicator(worker_id, base_port) - self.arenas_configurations = arenas_configurations if arenas_configurations is not None else ArenaConfig() + self.arenas_configurations = arenas_configurations if arenas_configurations is not None \ + else ArenaConfig() if file_name is not None: self.executable_launcher(file_name, docker_training) @@ -168,7 +169,9 @@ def executable_launcher(self, file_name, docker_training): if launch_string is None: self._close() raise UnityEnvironmentException("Couldn't launch the {0} environment. " - "Provided filename does not match any environments." + "Provided filename does not match any environments.\n" + "If you haven't done so already, follow the instructions at: " + "https://github.com/beyretb/AnimalAI-Olympics " .format(true_filename)) else: logger.debug("This is the launch string {}".format(launch_string)) @@ -215,18 +218,18 @@ def __str__(self): return '''Unity Academy name: {0} Number of Brains: {1} Number of Training Brains : {2}'''.format(self._academy_name, str(self._num_brains), - str(self._num_external_brains)) + str(self._num_external_brains)) - def reset(self, arenas_configurations_input=None, train_mode=True) -> AllBrainInfo: + def reset(self, arenas_configurations=None, train_mode=True) -> AllBrainInfo: """ Sends a signal to reset the unity environment. :return: AllBrainInfo : A data structure corresponding to the initial reset state of the environment. """ if self._loaded: - self.arenas_configurations.update(arenas_configurations_input) + self.arenas_configurations.update(arenas_configurations) outputs = self.communicator.exchange( - self._generate_reset_input(train_mode, arenas_configurations_input) + self._generate_reset_input(train_mode, arenas_configurations) ) if outputs is None: raise KeyboardInterrupt diff --git a/animalai/envs/exception.py b/animalai/animalai/envs/exception.py similarity index 100% rename from animalai/envs/exception.py rename to animalai/animalai/envs/exception.py diff --git a/animalai/animalai/envs/gym/environment.py b/animalai/animalai/envs/gym/environment.py new file mode 100644 index 00000000..1846952f --- /dev/null +++ b/animalai/animalai/envs/gym/environment.py @@ -0,0 +1,354 @@ +import logging +from PIL import Image +import itertools +import gym +import numpy as np +from animalai.envs import UnityEnvironment +from gym import error, spaces + + +class UnityGymException(error.Error): + """ + Any error related to the gym wrapper of ml-agents. + """ + pass + + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger("gym_unity") + + +class AnimalAIEnv(gym.Env): + """ + Provides Gym wrapper for Unity Learning Environments. + Multi-agent environments use lists for object types, as done here: + https://github.com/openai/multiagent-particle-envs + """ + + def __init__(self, + environment_filename: str, + worker_id=0, + docker_training=False, + n_arenas=1, + arenas_configurations=None, + greyscale=False, + retro=True): + """ + Environment initialization + :param environment_filename: The UnityEnvironment path or file to be wrapped in the gym. + :param worker_id: Worker number for environment. + :param docker_training: Whether this is running within a docker environment and should use a virtual + frame buffer (xvfb). + :param n_arenas: number of arenas to create in the environment (one agent per arena) + :param arenas_configurations: an ArenaConfig to configure the items present in each arena, will spawn random + objects randomly if not provided + :param greyscale: whether the visual observations should be grayscaled or not + :param retro: Resize visual observation to 84x84 (int8) and flattens action space. + """ + self._env = UnityEnvironment(file_name=environment_filename, + worker_id=worker_id, + docker_training=docker_training, + n_arenas=n_arenas, + arenas_configurations=arenas_configurations) + # self.name = self._env.academy_name + self.vector_obs = None + self._current_state = None + self._n_agents = None + self._flattener = None + self._greyscale = greyscale or retro + # self._seed = None + self.retro = retro + self.game_over = False # Hidden flag used by Atari environments to determine if the game is over + self.arenas_configurations = arenas_configurations + + self.flatten_branched = self.retro + self.uint8_visual = self.retro + + # Check brain configuration + if len(self._env.brains) != 1: + raise UnityGymException( + "There can only be one brain in a UnityEnvironment " + "if it is wrapped in a gym.") + self.brain_name = self._env.external_brain_names[0] + brain = self._env.brains[self.brain_name] + + if brain.number_visual_observations == 0: + raise UnityGymException("Environment provides no visual observations.") + + if brain.num_stacked_vector_observations != 1: + raise UnityGymException("Environment provides no vector observations.") + + # Check for number of agents in scene. + initial_info = self._env.reset(arenas_configurations=arenas_configurations)[self.brain_name] + self._check_agents(len(initial_info.agents)) + + if self.retro and self._n_agents > 1: + raise UnityGymException("Only one agent is allowed in retro mode, set n_agents to 1.") + + # Set observation and action spaces + if len(brain.vector_action_space_size) == 1: + self._action_space = spaces.Discrete(brain.vector_action_space_size[0]) + else: + if self.flatten_branched: + self._flattener = ActionFlattener(brain.vector_action_space_size) + self._action_space = self._flattener.action_space + else: + self._action_space = spaces.MultiDiscrete(brain.vector_action_space_size) + + # high = np.array([np.inf] * brain.vector_observation_space_size) + self.action_meanings = brain.vector_action_descriptions + + # if self.visual_obs: + if self._greyscale: + depth = 1 + else: + depth = 3 + + if self.retro: + image_space_max = 255 + image_space_dtype = np.uint8 + camera_height = 84 + camera_width = 84 + + image_space = spaces.Box( + 0, image_space_max, + dtype=image_space_dtype, + shape=(camera_height, camera_width, depth) + ) + + self._observation_space = image_space + else: + image_space_max = 1.0 + image_space_dtype = np.float32 + camera_height = brain.camera_resolutions[0]["height"] + camera_width = brain.camera_resolutions[0]["width"] + max_float = np.finfo(np.float32).max + + image_space = spaces.Box( + 0, image_space_max, + dtype=image_space_dtype, + shape=(self._n_agents, camera_height, camera_width, depth) + ) + vector_space = spaces.Box(-max_float, max_float, + shape=(self._n_agents, brain.vector_observation_space_size)) + self._observation_space = spaces.Tuple((image_space, vector_space)) + + def reset(self, arenas_configurations=None): + """Resets the state of the environment and returns an initial observation. + In the case of multi-agent environments, this is a list. + Returns: observation (object/list): the initial observation of the + space. + """ + info = self._env.reset(arenas_configurations=arenas_configurations)[self.brain_name] + n_agents = len(info.agents) + self._check_agents(n_agents) + self.game_over = False + + if self._n_agents == 1: + obs, reward, done, info = self._single_step(info) + else: + obs, reward, done, info = self._multi_step(info) + return obs + + def step(self, action): + """Run one timestep of the environment's dynamics. When end of + episode is reached, you are responsible for calling `reset()` + to reset this environment's state. + Accepts an action and returns a tuple (observation, reward, done, info). + In the case of multi-agent environments, these are lists. + Args: + action (object/list): an action provided by the environment + Returns: + observation (object/list): agent's observation of the current environment + reward (float/list) : amount of reward returned after previous action + done (boolean/list): whether the episode has ended. + info (dict): contains auxiliary diagnostic information, including BrainInfo. + """ + + # Use random actions for all other agents in environment. + if self._n_agents > 1: + if not isinstance(action, list): + raise UnityGymException("The environment was expecting `action` to be a list.") + if len(action) != self._n_agents: + raise UnityGymException( + "The environment was expecting a list of {} actions.".format(self._n_agents)) + else: + if self._flattener is not None: + # Action space is discrete and flattened - we expect a list of scalars + action = [self._flattener.lookup_action(_act) for _act in action] + action = np.array(action) + else: + if self._flattener is not None: + # Translate action into list + action = self._flattener.lookup_action(action) + + info = self._env.step(action)[self.brain_name] + n_agents = len(info.agents) + self._check_agents(n_agents) + self._current_state = info + + if self._n_agents == 1: + obs, reward, done, info = self._single_step(info) + self.game_over = done + else: + obs, reward, done, info = self._multi_step(info) + self.game_over = all(done) + return obs, reward, done, info + + def _single_step(self, info): + + self.visual_obs = self._preprocess_single(info.visual_observations[0][0, :, :, :]) + self.vector_obs = info.vector_observations[0] + + if self._greyscale: + self.visual_obs = self._greyscale_obs_single(self.visual_obs) + + if self.retro: + self.visual_obs = self._resize_observation(self.visual_obs) + default_observation = self.visual_obs + else: + default_observation = self.visual_obs, self.vector_obs + + return default_observation, info.rewards[0], info.local_done[0], { + "text_observation": info.text_observations[0], + "brain_info": info} + + def _preprocess_single(self, single_visual_obs): + if self.uint8_visual: + return (255.0 * single_visual_obs).astype(np.uint8) + else: + return single_visual_obs + + def _multi_step(self, info): + + self.visual_obs = self._preprocess_multi(info.visual_observations) + self.vector_obs = info.vector_observations + + if self._greyscale: + self.visual_obs = self._greyscale_obs_multi(self.visual_obs) + + default_observation = self.visual_obs + + return list(default_observation), info.rewards, info.local_done, { + "text_observation": info.text_observations, + "brain_info": info} + + def _preprocess_multi(self, multiple_visual_obs): + if self.uint8_visual: + return [(255.0 * _visual_obs).astype(np.uint8) for _visual_obs in multiple_visual_obs] + else: + return multiple_visual_obs + + def render(self, mode='rgb_array'): + return self.visual_obs + + def close(self): + """Override _close in your subclass to perform any necessary cleanup. + Environments will automatically close() themselves when + garbage collected or when the program exits. + """ + self._env.close() + + def get_action_meanings(self): + return self.action_meanings + + def seed(self, seed=None): + """Sets the seed for this env's random number generator(s). + Currently not implemented. + """ + logger.warning("Could not seed environment %s", self.name) + return + + @staticmethod + def _resize_observation(observation): + """ + Re-sizes visual observation to 84x84 + """ + obs_image = Image.fromarray(observation) + obs_image = obs_image.resize((84, 84), Image.NEAREST) + return np.array(obs_image) + + def _greyscale_obs_single(self, obs): + new_obs = np.floor(np.expand_dims(np.mean(obs, axis=2), axis=2)).squeeze().astype(np.uint8) + return new_obs + + def _greyscale_obs_multi(self, obs): + new_obs = [np.floor(np.expand_dims(np.mean(o, axis=2), axis=2)).squeeze().astype(np.uint8) for o in obs] + return new_obs + + def _check_agents(self, n_agents): + # if n_agents > 1: + # raise UnityGymException( + # "The environment was launched as a single-agent environment, however" + # "there is more than one agent in the scene.") + # elif self._multiagent and n_agents <= 1: + # raise UnityGymException( + # "The environment was launched as a mutli-agent environment, however" + # "there is only one agent in the scene.") + if self._n_agents is None: + self._n_agents = n_agents + logger.info("{} agents within environment.".format(n_agents)) + elif self._n_agents != n_agents: + raise UnityGymException("The number of agents in the environment has changed since " + "initialization. This is not supported.") + + @property + def metadata(self): + return {'render.modes': ['rgb_array']} + + @property + def reward_range(self): + return -float('inf'), float('inf') + + @property + def spec(self): + return None + + @property + def action_space(self): + return self._action_space + + @property + def observation_space(self): + return self._observation_space + + @property + def number_agents(self): + return self._n_agents + + +class ActionFlattener: + """ + Flattens branched discrete action spaces into single-branch discrete action spaces. + """ + + def __init__(self, branched_action_space): + """ + Initialize the flattener. + :param branched_action_space: A List containing the sizes of each branch of the action + space, e.g. [2,3,3] for three branches with size 2, 3, and 3 respectively. + """ + self._action_shape = branched_action_space + self.action_lookup = self._create_lookup(self._action_shape) + self.action_space = spaces.Discrete(len(self.action_lookup)) + + @classmethod + def _create_lookup(self, branched_action_space): + """ + Creates a Dict that maps discrete actions (scalars) to branched actions (lists). + Each key in the Dict maps to one unique set of branched actions, and each value + contains the List of branched actions. + """ + possible_vals = [range(_num) for _num in branched_action_space] + all_actions = [list(_action) for _action in itertools.product(*possible_vals)] + # Dict should be faster than List for large action spaces + action_lookup = {_scalar: _action for (_scalar, _action) in enumerate(all_actions)} + return action_lookup + + def lookup_action(self, action): + """ + Convert a scalar discrete action into a unique set of branched actions. + :param: action: A scalar value representing one of the discrete actions. + :return: The List containing the branched actions. + """ + return self.action_lookup[action] diff --git a/animalai/envs/rpc_communicator.py b/animalai/animalai/envs/rpc_communicator.py similarity index 98% rename from animalai/envs/rpc_communicator.py rename to animalai/animalai/envs/rpc_communicator.py index aa082305..ddc48ffd 100644 --- a/animalai/envs/rpc_communicator.py +++ b/animalai/animalai/envs/rpc_communicator.py @@ -74,7 +74,7 @@ def check_port(self, port): s.close() def initialize(self, inputs: UnityInput) -> UnityOutput: - if not self.unity_to_external.parent_conn.poll(3000): + if not self.unity_to_external.parent_conn.poll(90): raise UnityTimeOutException( "The Unity environment took too long to respond. Make sure that :\n" "\t The environment does not need user interaction to launch\n" diff --git a/animalai/envs/socket_communicator.py b/animalai/animalai/envs/socket_communicator.py similarity index 100% rename from animalai/envs/socket_communicator.py rename to animalai/animalai/envs/socket_communicator.py diff --git a/animalai/setup.py b/animalai/setup.py new file mode 100644 index 00000000..6a334ded --- /dev/null +++ b/animalai/setup.py @@ -0,0 +1,30 @@ +from setuptools import setup + +setup( + name='animalai', + version='0.5.0', + description='Animal AI competition interface', + url='https://github.com/beyretb/AnimalAI-Olympics', + author='Benjamin Beyret', + author_email='bb1010@ic.ac.uk', + + classifiers=[ + 'Intended Audience :: Developers', + 'Topic :: Scientific/Engineering :: Artificial Intelligence', + 'License :: OSI Approved :: Apache Software License', + 'Programming Language :: Python :: 3.6' + ], + + packages=['animalai.envs', 'animalai.envs.gym', 'animalai.communicator_objects'], # Required + zip_safe=False, + + install_requires=[ + 'Pillow>=4.2.1,<=5.4.1', + 'numpy>=1.13.3,<=1.14.5', + 'protobuf>=3.6,<3.7', + 'grpcio>=1.11.0,<1.12.0', + 'pyyaml>=5.1', + 'jsonpickle>=1.2', + 'gym'], + python_requires=">=3.5,<3.8", +) \ No newline at end of file diff --git a/documentation/PrefabsPictures/Arena.png b/documentation/PrefabsPictures/Arena.png index 299e0e7f..0523197a 100644 Binary files a/documentation/PrefabsPictures/Arena.png and b/documentation/PrefabsPictures/Arena.png differ diff --git a/documentation/PrefabsPictures/Rewards/HotZone.png b/documentation/PrefabsPictures/Rewards/HotZone.png new file mode 100644 index 00000000..cdebd2df Binary files /dev/null and b/documentation/PrefabsPictures/Rewards/HotZone.png differ diff --git a/documentation/configFile.md b/documentation/configFile.md index 33a3f979..a06a30ad 100644 --- a/documentation/configFile.md +++ b/documentation/configFile.md @@ -14,7 +14,8 @@ To configure training arenas you can use a simple **YAML file** and/or the **Are - on the fly changes of configuration of one or more arenas between episodes, allowing for easy curriculum learning for example - share configurations between participants -We provide a few custom configurations, but we expect designing good environments will be an important component of doing well in the competition. +We provide a few custom configurations, but we expect designing good environments will be an important component of doing + well in the competition. We describe below the structure of the configuration file for an instance of the training environment, as well as all the parameters and the values they can take. For how to change the configuration during training see `animalai/envs/ArenaConfig.py`. @@ -25,17 +26,19 @@ parameters and the values they can take. For how to change the configuration dur

-A single arena is as shown above, it comes with a single agent (blue cube, black dot showing the front), a floor and four walls. It is a square of size 40x40, the -origin of the arena is `(0,0)`, therefore you can provide coordinates for objects in the range `[0,40]x[0,40]` as floats. +A single arena is as shown above, it comes with a single agent (blue sphere, black dot showing the front), a floor and +four walls. It is a square of size 40x40, the origin of the arena is `(0,0)`. You can provide coordinates for +objects in the range `[0,40]x[0,40]` as floats. For visualization you can only configure a single arena, however during training you can configure as many as you want, each will have its local set of coordinates as described above. For a single arena you can provide the following parameters: - `t` an `int`, the length of an episode which can change from one episode to the other. A value of `0` means that the episode will -not terminate unlti a reward has been collected (setting `t=0` and having no reward will lead to an infinite episode) +not terminate until a reward has been collected (setting `t=0` and having no reward will lead to an infinite episode) - `blackouts` [see below](#blackouts) +Note that in Unity the **y** axis is the vertical axis. In the above picture with the agent on the ground in the center of the environment its coordinates are (20, 0, 20). ## Objects @@ -43,29 +46,28 @@ not terminate unlti a reward has been collected (setting `t=0` and having no rew All the objects that will be used during training are provided to you for training. All objects can be configured in the same manner, using a set of parameters for each item: -- `name`: the name of the object you want to spawn +- `name`: the name of the object you want to spawn. - `positions`: a list of `Vector3` positions within the arena where you want to spawn items, if the list -is empty the position will be sampled randomly in the arena -- `sizes`: a list of `Vector3` sizes, if the list is empty the size will be sampled randomly -- `rotations`: a list of `float` in the range `[0,360]`, if the list is empty the rotation is sampled randomly -- `colors`: a list of `RGB` values (integers in the range `[0,255]`), if the list is empty the color is sampled randomly +is empty the position will be sampled randomly in the aren. Any position dimension set to -1 will spawn randomly. +- `sizes`: a list of `Vector3` sizes, if the list is empty the size will be sampled randomly. You can set any size to -1 to spawn randomly along that dimension only. +- `rotations`: a list of `float` in the range `[0,360]`, if the list is empty the rotation is sampled randomly. +- `colors`: a list of `RGB` values (integers in the range `[0,255]`), if the list is empty the color is sampled randomly. -Any of these fields can be omitted in the configuration files, in which case the omitted fields are automatically randomized. +Any of these fields can be omitted in the configuration files, in which case the omitted fields are automatically randomized. Any Vector3 that contains a -1 for any of its dimensions will spawn that dimension randomly. This can be used to spawn, for example, multiple walls of a set width and height but random lengths. -**All values for the above fields can be found in [the definitions](definitionsOfObjects.md)**. +**All value ranges for the above fields can be found in [the definitions](definitionsOfObjects.md)**. If you go above or below the range for size it will automatically be set to the max or min respectively. If you try to spawn outside the arena (or overlapping with another object) then nothing will spawn. ## Blackouts -Blackouts are parameters you can pass to each arena, which define between which frames of an episode should the lights -be on or off. If omitted, this parameter automatically sets to have lights on for the entire episode. You can otherwise +Blackouts are parameters you can pass to each arena, which define between which frames of an episode the lights are +on or off. If omitted, this parameter automatically sets to have lights on for the entire episode. You can otherwise pass two types of arguments for this parameter: - passing a list of frames `[5,10,15,20,25]` will start with the lights on, switch them off from frames 5 to 9 included, then back on from 15 to 19 included etc... - passing a single negative argument `[-20]` will automatically switch lights on and off every 20 frames. -**Note**: for infinite episodes (where `t=0`), the first point above would leave the light off after frame `25` while the -second point would keep switching the lights every `20` frame indefinitely. +**Note**: for infinite episodes (where `t=0`), the first point above would leave the light off after frame `25` while the second point would keep switching the lights every `20` frames indefinitely. ## Rules and Notes @@ -73,25 +75,19 @@ There are certain rules to follow when configuring and arena as well as some des configuration file does not behave as you expect make sure you're not breaking one of the following: - Spawning objects: - - **Objects can only spawn if they do not overlap with each other** - - Attempting to spawn an object where another object already is will discard the latter. - - The environment will attempt to spawn objects in the order they are provided in the file. In the case where any of the - components is randomized we attempt to spawn the object **up to 20 times**. if no valid spawning spot is found the object is discarded. - - Due to the above point, the first objects in the list are more likely to spawn than the last ones + - **Objects can only spawn if they do not overlap with each other**. Attempting to spawn an object where another object already is will discard the latter. + - The environment will attempt to spawn objects in the order they are provided in the file. In the case where any of the components is randomized we attempt to spawn the object **up to 20 times**. if no valid spawning spot is found the object is discarded. + - Due to the above point, the first objects in the list are more likely to spawn than the last ones. - The `Agent` does not have to be provided in the configuration file, in which case it will spawn randomly. - - If an `Agent` position is provided, be aware that the **agent spawns last** therefore it might cause problems if other objects - randomly spawn where the agent should be - - In case an object is present where the `Agent` should spawn the arena resets and the process starts all over - - You can **spawn some objects on top of each others**, however be aware there is a `0.1` buffer automatically added to any height - you provide (to make sure things fall on each others nicely). + - If an `Agent` position is provided, be aware that the **agent spawns last** therefore it might cause problems if other objects randomly spawn where the agent should be. + - In case an object is present where the `Agent` should spawn the arena resets and the process starts all over. + - You can **spawn some objects on top of each others**, however be aware there is a `0.1` buffer automatically added to any height you provide (to make sure things fall on each others nicely). - Configuration file values: - - Objects' `name` have to match one of the names provided in [the definitions](definitionsOfObjects.md), if the name provided is not - found in this list, the object is ignored - - Any component of `positions`, `sizes` and `rotations` can be randomized by providing a value sof `-1`. + - Objects' `name` have to match one of the names provided in [the definitions](definitionsOfObjects.md), if the name provided is not found in this list, the object is ignored. + - Any component of `positions`, `sizes` and `rotations` can be randomized by providing a value of `-1`. - Note that setting `positions.y = -1` will spawn the object at ground level. - - Goals (except for the red zone) can only be scaled equally on all axes, therefore they will always remain spheres. If - a `Vector3` is provided for the scale of a sphere goal only the `x` component is used to scale all axes equally. + - Goals (except for the red zone) can only be scaled equally on all axes, therefore they will always remain spheres. If a `Vector3` is provided for the scale of a sphere goal only the `x` component is used to scale all axes equally. ## Detailed example @@ -123,15 +119,11 @@ arenas: name: GoodGoal ``` -First of all, we can see that the number of parameters for `positions`, `rotations` and `sizes` do not need to match. The -environment will spawn `max( len(positions), len(rotations), len(sizes) )` objects, where `len()` is the length of the list. -Any parameter missing will be sampled randomly. +First of all, we can see that the number of parameters for `positions`, `rotations` and `sizes` do not need to match. The environment will spawn `max( len(positions), len(rotations), len(sizes) )` objects, where `len()` is the length of the list. Any mising parameter will correspond to a randomly generated value. -In this case this will lead to: -- a pink `Cube` spawned in `[10,10]` on the ground with rotation `45` and a size randomized on both `x` and `z` and of `y=5` -- a `Cube` spawned on the ground, with a random `x` and `z=30`, its rotation, size and color will be random -- three pink `CylinderTunnel` completely randomized -- a `GoodGoal` randomized -- the agent which position and rotation are randomized too - -The arena will spawn these objects in this order. \ No newline at end of file +In this case this will lead to (in order that they will spawn): +- a pink `Cube` spawned at `[10,10]` on the ground with rotation `45` and a size randomized on both `x` and `z` and of `y=5`. +- a `Cube` spawned on the ground, with a random `x` and `z=30`. Its rotation, size and color will be random. +- three pink `CylinderTunnel` completely randomized. +- a `GoodGoal` randomized. +- the agent with position and rotation randomized. diff --git a/documentation/definitionsOfObjects.md b/documentation/definitionsOfObjects.md index c0610a8f..19173534 100644 --- a/documentation/definitionsOfObjects.md +++ b/documentation/definitionsOfObjects.md @@ -5,21 +5,18 @@ The objects you can spawn in an arena are split among three categories: - immovable - rewards -Below is a list of objects you can spawn. For each we describe the name you should use to refer to in your configuration files -or in Python directly, as well as their default characteristics and range of values you can assign to them. **All objects can -be rotated `360` degrees.** +Below is a list of objects you can spawn. For each we describe the name you should use in your configuration files +or in Python directly, as well as their default characteristics and the range of values you can assign to them. **All objects can be rotated `360` degrees.** -Each object has an orientation, we provide the three axes for all of those that are not symmetrical. The color code of the -axes is as depicted below: +Each object has an orientation, we provide the three axes for all of those that are not symmetrical. The color code of the axes is as depicted below: -**Note:** as depicted above the vertical axis is th **Y axis**, we will use Z as the forward axis (both conventions are -the ones used in Unity). +**Note:** the **Y axis** is the vertical axis and **Z** is the forward axis (following conventions used in Unity). #### Immovable -These are objects that are fixed and will not be impacted by the agent or other objects: +These objects are fixed and cannot be moved: - a rectangular tunnel - name: `CubeTunnel` @@ -50,7 +47,7 @@ These are objects that are fixed and will not be impacted by the agent or other #### Movable -These are objects the agent can move and which will be affected by each other, fixed objects and rewards if they collide +These are objects the agent can move and which will be affected by each other, fixed objects and rewards if they collide. Note that different object types weight different amounts. It is easier to push a cardboard box than a cube that's like a wall. - a cube that can be pushed - name: `Cube` @@ -87,8 +84,7 @@ These are objects the agent can move and which will be affected by each other, f #### Rewards -Objects that may terminate the event if the agents collides with one. **Important note:** for sphere goals the `y` and `z` -components of the provided sizes are ignored and only the `x` one counts +Objects that give a reward and may terminate the event if the agents collides with one. **Important note:** for sphere goals the `y` and `z` components of the provided sizes are ignored and only the `x` one counts - Good goals: green spheres with a positive reward equal to their size, terminate an episode @@ -139,9 +135,16 @@ components of the provided sizes are ignored and only the `x` one counts - name: `GoodGoalMultiBounce` - size range `1-3` - Deathzone: - - a a deathzone with reward -1 + - a a red zone with reward -1 that end an + episode - name: `DeathZone` - size range `(1,0,1)-(40,0,40)` - **the deathzone is always flat and located on the ground** - - terminates episode - + - terminates an episode +- HotZone: + - a an orange zone with reward + `min(-3/T,-1e-5)` that **does not** end an episode + - name: `HotZone` + - size range `(1,0,1)-(40,0,40)` + - **the hotzone is always flat and located on the ground** + - does not terminate and episode diff --git a/documentation/quickstart.md b/documentation/quickstart.md index f4fe1bd9..7d2b0634 100644 --- a/documentation/quickstart.md +++ b/documentation/quickstart.md @@ -1,38 +1,44 @@ # Quick Start Guide -You can run the Animal AI environment in three different ways: -- running the the standalone `AnimalAI` executable -- running a configuration file via `visualizeArena.py` -- start training using `train.py` +The format of this competition may be a little different to the standard machine learning model. We do not provide a single training set that you can train on out of the box and we do not provide full information about the testing set in advance. Instead, you will need to choose for yourself what you expect to be useful configurations of our training environment in order to train an agent capable of robust food retrieval behaviour. + +To facilitate working with this new paradigm we created tools you can use to easily setup and visualize your training environment. ## Running the standalone arena -Running the executable `AnimalAI` that you should have separately downloaded and added to the `envs` folder starts a -playable environment with default configurations in a single arena. You can toggle the camera between First Person and -Bird's eye view using the `C` key on your keyboard. The agent can then be controlled using `W,A,S,D` on your keyboard. -The objects present in the configuration are randomly sampled from the list of objects that can be spawned, their -location is random too. Hitting `R` or collecting rewards will reset the arena. +The basic environment is made of a single agent in an enclosed arena that resembles an environment that could be used for experimenting with animals. In this environment you can add objects the agents can interact with, as well as goals or rewards the agent must collect or avoid. To see what this looks like, you can run the executable environment directly. This will spawn an arena filled with randomly placed objects. Of course, this is a very messy environment to begin training on, so we provide a configuration file where you choose what to spawn (see below). -**Note**: on some platforms, running the standalone arena in full screen makes the environment slow, keep the -environment in window mode for better performance. +You can toggle the camera between First Person and Bird's eye view using the `C` key on your keyboard. The agent can +then be controlled using `W,A,S,D` on your keyboard. Hitting `R` or collecting certain rewards (green or red) will reset the arena. ## Running a specific configuration file -The `visualizeArena.py` script found in the main folder allows you to visualize an arena configuration file. We provide -sample configuration files for you to experiment with. To make your own environment configuration file we advise to read -thoroughly the [configuration file documentation page](configFile.md). You will find a detailed list of all the objects on the [definitions of objects page](definitionsOfObjects.md). Running this script only allows for a single arena to be visualized at once, as there can only be a single agent you control. - -For example, to run an environment that contains the agent, a goal, and some randomly placed walls use: +Once you are familiarized with the environment and its physics, you can start building and visualizing your own. Assuming you followed the [installation instruction](../README.md#requirements), go to the `examples/` folder and run +`python visualizeArena.py configs/exampleConfig.yaml`. This loads the `configs/exampleConfig.yaml` configuration for the +arena and lets you play as the agent. -``` -python visualizeArena.py configs/obstacles.yaml -``` +Have a look at the [configuration file](configs/exampleConfig.yaml) which specifies the objects to place. You can select +objects, their size, location, rotation and color, randomizing any of these parameters as you like. For more details on the configuration options and syntax please read the relevant documentation: + - The [configuration file documentation page](configFile.md) which explains how to write the configuration files. + - The [definitions of objects page](definitionsOfObjects.md) which contains a detailed list of all the objects and their + characteristics. ## Start training your agent -Once you're happy with your arena configuration you can start training your agent. This can be done in a way very similar -to a regular [gym](https://github.com/openai/gym) environment. We provide a template training file `train.py` you can run -out of the box, it uses the [ML agents' PPO](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md) -for training. We added the ability for participants to **change the environment configuration between episodes**. You can -find more details about that in the [training documentation](training.md). +Once you're happy with your arena configurations you can start training your agent. The `animalai` package includes several features to help with this: + +- It is possible to **change the environment configuration between episodes** (allowing for techniques such as curriculum learning). +- You can **choose the length of length of each episode** as part of the configuration files, even having infinite episodes. +- You can **have several arenas in a single environment instance**, each with an agent you control independently from the other, and each with its own configuration allowing for collecting observations faster. + +We provide examples of training using the `animalai-train` package, you can of course start from scratch and submit agents that do not rely on this library. To understand how training an `animalai` environment we provide scripts in the +`examples/` folder: + +- `trainDopamine.py` uses the `dopamine` implementation of Rainbow to train a single agent using the gym interface. This +is a good starting point if you want to try another training algorithm that works as a plug-and-play with Gym. **Note that using the gym interface only allows for training with a single arena and agent in the environment at a time.** We do offer to train with several agents in a gym environment but this will require modifying your code to accept more than one observation at a time. +- `trainMLAgents.py` uses the `ml-agents` implementation of PPO to train one or more agents at a time, using the +`UnityEnvironment`. This is a great starting point if you don't mind reading some code as it directly allows to use the +functionalities described above, out of the box. + +You can find more details about this in the [training documentation](training.md). diff --git a/documentation/training.md b/documentation/training.md index 0d895c51..1c9980a8 100644 --- a/documentation/training.md +++ b/documentation/training.md @@ -2,33 +2,43 @@ ## Overview -Training happens very much like with a regular gym environment. We provide you with both the compiled -environment and the Python libraries needed for training. You will also find an example of training agent -using [ML-Agents' PPO](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md). +The `animalai` packages offers two kind of interfaces to use for training: a gym environment and an ml-agents one. We +also provide the `animalai-train` package to showcase how training and submissions work. This can serve as a starting + point for your own code, however, you are not required to use this package at all for submissions. + +If you are not familiar with these algorithms, have a look at +[ML-Agents' PPO](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md), as well as +[dopamine's Rainbow](https://google.github.io/dopamine/). ## Observations and actions Before looking at the environment itself, we define here the actions the agent can take and the observations it collects: -- **Actions**: the agent can move forward/backward and rotate left/right, just like in play mode. Therefore the -actions are discrete of dimension `2`, each component can take 3 values (`(nothing, forward, backward)` and `(nothing, left,right)`). +- **Actions**: the agent can move forward/backward and rotate left/right, just like in play mode. The +actions are discrete and of dimension `2`, each component can take 3 values (`(nothing, forward, backward)` and `(nothing, +left,right)`). - **Observations** are made of two components: visual observations which are pixel based and of dimension `84x84x3`, as -well as the speed of the agent which is continuous of dimension `3` (speed along axes `(x,y,z)` in this order). +well as the speed of the agent which is continuous of dimension `3` (speed along axes `(x,y,z)` in this order). Of course, you may want to process and/or scale down the input before use with your approach. - **Rewards**: in case of an episode of finite length `T`, each step carries a small negative reward `-1/T`. In case of -an episode with no time limite (`T=0`), no reward is returned for each step. Other rewards come from the rewards objects +an episode with no time limit (`T=0`), no reward is returned for each step. Other rewards come from the rewards objects (see details [here](definitionsOfObjects.md)). ## The Unity Environment Much like a gym environment, you can create a `UnityEnvironment` that manages all communications with -the environment. You will first need to instantiate the environement, you can then reset it, take steps and collect -observations. All the codebase for this is in `animalai/envs/environment.py`. Below is a quick description of these components. +the environment. You will first need to instantiate the environment, you can then reset it, take steps and collect +observations. All the codebase for this is in `animalai/envs/environment.py`. Below is a quick description of these +components. + +We provide an example of training using `UnityEnvironment` in `examples/trainMLAgents.py`. ### Instantiation For example, you can call:: ``` +from animalai.envs import UnityEnvironment + env= UnityEnvironment( file_name='env/AnimalAI', # Path to the environment worker_id=1, # Unique ID for running the environment (used for connection) @@ -41,17 +51,17 @@ env= UnityEnvironment( ``` Note that the path to the executable file should be stripped of its extension. The `no_graphics` parameter should always -be set to `False` as it impacts the collection of visual obeservations, which we rely on. +be set to `False` as it impacts the collection of visual observations, which we rely on. ### Reset -We have modified this functionality compared to the mlagents codebase. Here we add the possibility to pass a new `ArenaConfiguration` -as an argument to reset the environment. The environment will use the new configuration for this reset, as well as all the -following ones until a new configuration is passed. The syntax is: +We have modified this functionality compared to the mlagents codebase. Here we add the possibility to pass a new +`ArenaConfiguration` as an argument to reset the environment. The environment will use the new configuration for this +reset, as well as all the following ones until a new configuration is passed. The syntax is: ``` -env.reset(arenas_configurations_input=arena_config, # A new ArenaConfig to use for reset, leave empty to use the last one provided - train_mode=True # True for training +env.reset(arenas_configurations=arena_config, # A new ArenaConfig to use for reset, leave empty to use the last one provided + train_mode=True # True for training ) ``` @@ -65,8 +75,6 @@ For example, if you only want to modify arena number 3, you could create an `Are arenas: 3: !Arena t: 0 - rand_all_colors: true - rand_all_sizes: true items: - !Item (...) @@ -74,25 +82,43 @@ arenas: ### Step -Taking a step returns a data structure named `BrainInfo` which is defined in `animalai/envs/brain` and basically contains -all the information returned by the environment after taking a step, including the observations. For example: +Taking a step returns a data structure named `BrainInfo` which is defined in `animalai/envs/brain` and basically contains all the information returned by the environment including the observations. For example: ``` info = env.step(vector_action=take_action_vector) ``` -You can pass more parameters to the environment depending on what you need for training, to learn about this and the +This line will return all the data needed for training, in our case where `n_arenas=4` you will get: + +``` +brain = info['Learner'] + +brain.visual_observations # list of 4 pixel observations, each of size (84x84x3) +brain.vector_observation # list of 4 speeds, each of size 3 +brain.reward # list of 4 float rewards +brain.local_done # list of 4 booleans to flag if each agent is done or not +``` + +You can pass more parameters to the environment depending on what you need for training. To learn about this and the format of the `BrainInfo`, see the [official mal-agents' documentation](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Python-API.md#interacting-with-a-unity-environment). ### Close -Don't forget to close the environment once training is done so that all communications are closed properly and ports +Don't forget to close the environment once training is done so that all communications are terminated properly and ports are not left open (which can prevent future connections). ``` env.close() ``` +## Gym wrapper + +We also provide a gym wrapper to implement the OpenAI interface in order to directly plug baselines and start training. +One limitation of this implementation is the use of a single agent per environment. This will let you collect less +observations per episode and therefore make training slower. A later release might fix this and allow for multiple agents. + +We provide an example of training using gym in `examples/trainDopamine.py`. + ## Notes Some important points to note for training: diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 00000000..fa509c82 --- /dev/null +++ b/examples/README.md @@ -0,0 +1,37 @@ +# Visualization and Training + +We provide in this folder a few examples for competing in the AnimalAI Olympics. You will first of all need to set up +a training environment with a specific configuration. For this part we provide a script to visualize your configurations. +You will then need to train an agent on this configuration, which can be done however you prefer, we provide here two +examples, one for each interface. + +## Visualizing configurations + +Once you have [created a configuration file](../documentation/configFile.md), you may want to see what it actually looks +like. To do so you can simply run: + +``` +python visualizeArena.py configs/exampleCondig.yaml +``` + +replacing `exampleConfig.yaml` with the name of your file(s). Once this is launched, you can control the agent using the +same keystrokes as described [here](../README.md#manual-control). + +We also provide an example of what switching lights on/off looks like for the agent and how to configure this feature. +Run `python visualizeLightsOff.py` and read `configs/lightsOff.yaml` to see how four different agents in the same +environment can have different lights setups. + +## Training agents + +We strongly encourage you to read the code in the training files to familiarize yourself with the syntax of the two +packages we provide. We will also release Jupyter notebooks in a future release to make this step more straightforward. + +### Using ML Agents interface + +You can run `python trainMLAgents.py` to start training using PPO and the default configuration +`configs/exampleTraining.yaml`. This scripts instantiates 4 agents in a single environment, therefore collecting more +observations at once and speeding up training. + +### Using the Gym interface + +Run `python trainDopamine.py` to run Rainbow, a single agent using the Gym interface and Dopamine. diff --git a/examples/animalai_train/LICENSE b/examples/animalai_train/LICENSE new file mode 100644 index 00000000..7ff5035e --- /dev/null +++ b/examples/animalai_train/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "{}" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright 2017 Unity Technologies + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. \ No newline at end of file diff --git a/examples/animalai_train/README.md b/examples/animalai_train/README.md new file mode 100644 index 00000000..85bbd236 --- /dev/null +++ b/examples/animalai_train/README.md @@ -0,0 +1,8 @@ +# AnimalAI Python API + +This package provides training libraries for training agents for the Animal AI Olympics competition. It is not required +to use this as part of submissions, it is however useful for running the [examples provided](../README.md). We provide +an extension of [Unity's MLAgents](https://github.com/Unity-Technologies/ml-agents/tree/master/ml-agents) as well as +[dopamine](https://github.com/google/dopamine). + +For more details and documentation have a look at the [AnimalAI documentation](../documentation) \ No newline at end of file diff --git a/examples/animalai_train/animalai_train/__init__.py b/examples/animalai_train/animalai_train/__init__.py new file mode 100644 index 00000000..fed01500 --- /dev/null +++ b/examples/animalai_train/animalai_train/__init__.py @@ -0,0 +1 @@ +name = "animalai_train" \ No newline at end of file diff --git a/examples/animalai_train/animalai_train/dopamine/animalai_lib.py b/examples/animalai_train/animalai_train/dopamine/animalai_lib.py new file mode 100644 index 00000000..532ea075 --- /dev/null +++ b/examples/animalai_train/animalai_train/dopamine/animalai_lib.py @@ -0,0 +1,270 @@ +# coding=utf-8 +# Copyright 2018 The Dopamine Authors. +# Modifications copyright 2019 Unity Technologies. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Obstacle Tower-specific utilities including Atari-specific network architectures. + +This includes a class implementing minimal preprocessing, which +is in charge of: + . Converting observations to greyscale. +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math + +from animalai.envs.gym.environment import AnimalAIEnv + +import numpy as np +import tensorflow as tf + +import gin.tf +import cv2 + +slim = tf.contrib.slim + +NATURE_DQN_OBSERVATION_SHAPE = (84, 84) # Size of downscaled Atari 2600 frame. +NATURE_DQN_DTYPE = tf.uint8 # DType of Atari 2600 observations. +NATURE_DQN_STACK_SIZE = 4 # Number of frames in the state stack. + + +@gin.configurable +def create_animalai_environment(environment_path=None): + """Wraps the Animal AI environment with some basic preprocessing. + + Returns: + An Animal AI environment with some standard preprocessing. + """ + assert environment_path is not None + env = AnimalAIEnv(environment_path, 0, n_arenas=1, retro=True) + env = OTCPreprocessing(env) + return env + +@gin.configurable +def nature_dqn_network(num_actions, network_type, state): + """The convolutional network used to compute the agent's Q-values. + + Args: + num_actions: int, number of actions. + network_type: namedtuple, collection of expected values to return. + state: `tf.Tensor`, contains the agent's current state. + + Returns: + net: _network_type object containing the tensors output by the network. + """ + net = tf.cast(state, tf.float32) + net = tf.div(net, 255.) + net = slim.conv2d(net, 32, [8, 8], stride=4) + net = slim.conv2d(net, 64, [4, 4], stride=2) + net = slim.conv2d(net, 64, [3, 3], stride=1) + net = slim.flatten(net) + net = slim.fully_connected(net, 512) + q_values = slim.fully_connected(net, num_actions, activation_fn=None) + return network_type(q_values) + +@gin.configurable +def rainbow_network(num_actions, num_atoms, support, network_type, state): + """The convolutional network used to compute agent's Q-value distributions. + + Args: + num_actions: int, number of actions. + num_atoms: int, the number of buckets of the value function distribution. + support: tf.linspace, the support of the Q-value distribution. + network_type: namedtuple, collection of expected values to return. + state: `tf.Tensor`, contains the agent's current state. + + Returns: + net: _network_type object containing the tensors output by the network. + """ + weights_initializer = slim.variance_scaling_initializer( + factor=1.0 / np.sqrt(3.0), mode='FAN_IN', uniform=True) + + net = tf.cast(state, tf.float32) + net = tf.div(net, 255.) + net = slim.conv2d( + net, 32, [8, 8], stride=4, weights_initializer=weights_initializer) + net = slim.conv2d( + net, 64, [4, 4], stride=2, weights_initializer=weights_initializer) + net = slim.conv2d( + net, 64, [3, 3], stride=1, weights_initializer=weights_initializer) + net = slim.flatten(net) + net = slim.fully_connected( + net, 512, weights_initializer=weights_initializer) + net = slim.fully_connected( + net, + num_actions * num_atoms, + activation_fn=None, + weights_initializer=weights_initializer) + + logits = tf.reshape(net, [-1, num_actions, num_atoms]) + probabilities = tf.contrib.layers.softmax(logits) + q_values = tf.reduce_sum(support * probabilities, axis=2) + return network_type(q_values, logits, probabilities) + +@gin.configurable +def implicit_quantile_network(num_actions, quantile_embedding_dim, + network_type, state, num_quantiles): + """The Implicit Quantile ConvNet. + + Args: + num_actions: int, number of actions. + quantile_embedding_dim: int, embedding dimension for the quantile input. + network_type: namedtuple, collection of expected values to return. + state: `tf.Tensor`, contains the agent's current state. + num_quantiles: int, number of quantile inputs. + + Returns: + net: _network_type object containing the tensors output by the network. + """ + weights_initializer = slim.variance_scaling_initializer( + factor=1.0 / np.sqrt(3.0), mode='FAN_IN', uniform=True) + + state_net = tf.cast(state, tf.float32) + state_net = tf.div(state_net, 255.) + state_net = slim.conv2d( + state_net, 32, [8, 8], stride=4, + weights_initializer=weights_initializer) + state_net = slim.conv2d( + state_net, 64, [4, 4], stride=2, + weights_initializer=weights_initializer) + state_net = slim.conv2d( + state_net, 64, [3, 3], stride=1, + weights_initializer=weights_initializer) + state_net = slim.flatten(state_net) + state_net_size = state_net.get_shape().as_list()[-1] + state_net_tiled = tf.tile(state_net, [num_quantiles, 1]) + + batch_size = state_net.get_shape().as_list()[0] + quantiles_shape = [num_quantiles * batch_size, 1] + quantiles = tf.random_uniform( + quantiles_shape, minval=0, maxval=1, dtype=tf.float32) + + quantile_net = tf.tile(quantiles, [1, quantile_embedding_dim]) + pi = tf.constant(math.pi) + quantile_net = tf.cast(tf.range( + 1, quantile_embedding_dim + 1, 1), tf.float32) * pi * quantile_net + quantile_net = tf.cos(quantile_net) + quantile_net = slim.fully_connected(quantile_net, state_net_size, + weights_initializer=weights_initializer) + # Hadamard product. + net = tf.multiply(state_net_tiled, quantile_net) + + net = slim.fully_connected( + net, 512, weights_initializer=weights_initializer) + quantile_values = slim.fully_connected( + net, + num_actions, + activation_fn=None, + weights_initializer=weights_initializer) + + return network_type(quantile_values=quantile_values, quantiles=quantiles) + +# +# @gin.configurable +# class AAIPreprocessing(object): +# """A class implementing image preprocessing for OTC agents. +# +# Specifically, this converts observations to greyscale. It doesn't +# do anything else to the environment. +# """ +# +# def __init__(self, environment): +# """Constructor for an Obstacle Tower preprocessor. +# +# Args: +# environment: Gym environment whose observations are preprocessed. +# +# """ +# self.environment = environment +# +# self.game_over = False +# self.lives = 0 # Will need to be set by reset(). +# +# @property +# def observation_space(self): +# return self.environment.observation_space +# +# @property +# def action_space(self): +# return self.environment.action_space +# +# @property +# def reward_range(self): +# return self.environment.reward_range +# +# @property +# def metadata(self): +# return self.environment.metadata +# +# def reset(self): +# """Resets the environment. Converts the observation to greyscale, +# if it is not. +# +# Returns: +# observation: numpy array, the initial observation emitted by the +# environment. +# """ +# observation = self.environment.reset() +# if (len(observation.shape) > 2): +# observation = cv2.cvtColor(observation, cv2.COLOR_RGB2GRAY) +# +# return observation +# +# def render(self, mode): +# """Renders the current screen, before preprocessing. +# +# This calls the Gym API's render() method. +# +# Args: +# mode: Mode argument for the environment's render() method. +# Valid values (str) are: +# 'rgb_array': returns the raw ALE image. +# 'human': renders to display via the Gym renderer. +# +# Returns: +# if mode='rgb_array': numpy array, the most recent screen. +# if mode='human': bool, whether the rendering was successful. +# """ +# return self.environment.render(mode) +# +# def step(self, action): +# """Applies the given action in the environment. Converts the observation to +# greyscale, if it is not. +# +# Remarks: +# +# * If a terminal state (from life loss or episode end) is reached, this may +# execute fewer than self.frame_skip steps in the environment. +# * Furthermore, in this case the returned observation may not contain valid +# image data and should be ignored. +# +# Args: +# action: The action to be executed. +# +# Returns: +# observation: numpy array, the observation following the action. +# reward: float, the reward following the action. +# is_terminal: bool, whether the environment has reached a terminal state. +# This is true when a life is lost and terminal_on_life_loss, or when the +# episode is over. +# info: Gym API's info data structure. +# """ +# +# observation, reward, game_over, info = self.environment.step(action) +# self.game_over = game_over +# if (len(observation.shape) > 2): +# observation = cv2.cvtColor(observation, cv2.COLOR_RGB2GRAY) +# return observation, reward, game_over, info diff --git a/animalai/trainers/__init__.py b/examples/animalai_train/animalai_train/trainers/__init__.py similarity index 100% rename from animalai/trainers/__init__.py rename to examples/animalai_train/animalai_train/trainers/__init__.py diff --git a/animalai/trainers/barracuda.py b/examples/animalai_train/animalai_train/trainers/barracuda.py similarity index 100% rename from animalai/trainers/barracuda.py rename to examples/animalai_train/animalai_train/trainers/barracuda.py diff --git a/animalai/trainers/bc/__init__.py b/examples/animalai_train/animalai_train/trainers/bc/__init__.py similarity index 100% rename from animalai/trainers/bc/__init__.py rename to examples/animalai_train/animalai_train/trainers/bc/__init__.py diff --git a/animalai/trainers/bc/models.py b/examples/animalai_train/animalai_train/trainers/bc/models.py similarity index 98% rename from animalai/trainers/bc/models.py rename to examples/animalai_train/animalai_train/trainers/bc/models.py index 06cdab6d..e1ef94d5 100644 --- a/animalai/trainers/bc/models.py +++ b/examples/animalai_train/animalai_train/trainers/bc/models.py @@ -1,6 +1,6 @@ import tensorflow as tf import tensorflow.contrib.layers as c_layers -from animalai.trainers.models import LearningModel +from animalai_train.trainers.models import LearningModel class BehavioralCloningModel(LearningModel): diff --git a/animalai/trainers/bc/offline_trainer.py b/examples/animalai_train/animalai_train/trainers/bc/offline_trainer.py similarity index 92% rename from animalai/trainers/bc/offline_trainer.py rename to examples/animalai_train/animalai_train/trainers/bc/offline_trainer.py index 36e209f2..ebdbc443 100644 --- a/animalai/trainers/bc/offline_trainer.py +++ b/examples/animalai_train/animalai_train/trainers/bc/offline_trainer.py @@ -5,9 +5,9 @@ import logging import copy -from animalai.trainers.bc.trainer import BCTrainer -from animalai.trainers.demo_loader import demo_to_buffer -from animalai.trainers.trainer import UnityTrainerException +from animalai_train.trainers.bc.trainer import BCTrainer +from animalai_train.trainers.demo_loader import demo_to_buffer +from animalai_train.trainers.trainer import UnityTrainerException logger = logging.getLogger("mlagents.trainers") diff --git a/animalai/trainers/bc/online_trainer.py b/examples/animalai_train/animalai_train/trainers/bc/online_trainer.py similarity index 99% rename from animalai/trainers/bc/online_trainer.py rename to examples/animalai_train/animalai_train/trainers/bc/online_trainer.py index d06ac321..91ba340c 100644 --- a/animalai/trainers/bc/online_trainer.py +++ b/examples/animalai_train/animalai_train/trainers/bc/online_trainer.py @@ -6,7 +6,7 @@ import numpy as np from animalai.envs import AllBrainInfo -from animalai.trainers.bc.trainer import BCTrainer +from animalai_train.trainers.bc.trainer import BCTrainer logger = logging.getLogger("mlagents.trainers") diff --git a/animalai/trainers/bc/policy.py b/examples/animalai_train/animalai_train/trainers/bc/policy.py similarity index 96% rename from animalai/trainers/bc/policy.py rename to examples/animalai_train/animalai_train/trainers/bc/policy.py index b9fd3bdb..f2d990b3 100644 --- a/animalai/trainers/bc/policy.py +++ b/examples/animalai_train/animalai_train/trainers/bc/policy.py @@ -1,8 +1,8 @@ import logging import numpy as np -from animalai.trainers.bc.models import BehavioralCloningModel -from animalai.trainers.policy import Policy +from animalai_train.trainers.bc.models import BehavioralCloningModel +from animalai_train.trainers.policy import Policy logger = logging.getLogger("mlagents.trainers") diff --git a/animalai/trainers/bc/trainer.py b/examples/animalai_train/animalai_train/trainers/bc/trainer.py similarity index 98% rename from animalai/trainers/bc/trainer.py rename to examples/animalai_train/animalai_train/trainers/bc/trainer.py index bdc2010f..dbd4d9c4 100644 --- a/animalai/trainers/bc/trainer.py +++ b/examples/animalai_train/animalai_train/trainers/bc/trainer.py @@ -9,9 +9,9 @@ import tensorflow as tf from animalai.envs import AllBrainInfo -from animalai.trainers.bc.policy import BCPolicy -from animalai.trainers.buffer import Buffer -from animalai.trainers.trainer import Trainer +from animalai_train.trainers.bc.policy import BCPolicy +from animalai_train.trainers.buffer import Buffer +from animalai_train.trainers.trainer import Trainer logger = logging.getLogger("mlagents.trainers") diff --git a/animalai/trainers/buffer.py b/examples/animalai_train/animalai_train/trainers/buffer.py similarity index 100% rename from animalai/trainers/buffer.py rename to examples/animalai_train/animalai_train/trainers/buffer.py diff --git a/animalai/trainers/curriculum.py b/examples/animalai_train/animalai_train/trainers/curriculum.py similarity index 100% rename from animalai/trainers/curriculum.py rename to examples/animalai_train/animalai_train/trainers/curriculum.py diff --git a/animalai/trainers/demo_loader.py b/examples/animalai_train/animalai_train/trainers/demo_loader.py similarity index 98% rename from animalai/trainers/demo_loader.py rename to examples/animalai_train/animalai_train/trainers/demo_loader.py index 02c8f68b..173615e4 100644 --- a/animalai/trainers/demo_loader.py +++ b/examples/animalai_train/animalai_train/trainers/demo_loader.py @@ -1,7 +1,7 @@ import pathlib import logging import os -from animalai.trainers.buffer import Buffer +from animalai_train.trainers.buffer import Buffer from animalai.envs.brain import BrainParameters, BrainInfo from animalai.communicator_objects import * from google.protobuf.internal.decoder import _DecodeVarint32 diff --git a/animalai/trainers/exception.py b/examples/animalai_train/animalai_train/trainers/exception.py similarity index 100% rename from animalai/trainers/exception.py rename to examples/animalai_train/animalai_train/trainers/exception.py diff --git a/animalai/trainers/learn.py b/examples/animalai_train/animalai_train/trainers/learn.py similarity index 98% rename from animalai/trainers/learn.py rename to examples/animalai_train/animalai_train/trainers/learn.py index 309e4f66..66050d9e 100644 --- a/animalai/trainers/learn.py +++ b/examples/animalai_train/animalai_train/trainers/learn.py @@ -12,9 +12,9 @@ from typing import Optional -from animalai.trainers.trainer_controller import TrainerController -from animalai.trainers.exception import TrainerError -from animalai.trainers import MetaCurriculumError, MetaCurriculum +from animalai_train.trainers.trainer_controller import TrainerController +from animalai_train.trainers.exception import TrainerError +from animalai_train.trainers import MetaCurriculumError, MetaCurriculum from animalai.envs import UnityEnvironment from animalai.envs.exception import UnityEnvironmentException diff --git a/animalai/trainers/meta_curriculum.py b/examples/animalai_train/animalai_train/trainers/meta_curriculum.py similarity index 97% rename from animalai/trainers/meta_curriculum.py rename to examples/animalai_train/animalai_train/trainers/meta_curriculum.py index f71e91e3..9809a887 100644 --- a/animalai/trainers/meta_curriculum.py +++ b/examples/animalai_train/animalai_train/trainers/meta_curriculum.py @@ -1,8 +1,8 @@ """Contains the MetaCurriculum class.""" import os -from animalai.trainers.curriculum import Curriculum -from animalai.trainers.exception import MetaCurriculumError +from animalai_train.trainers.curriculum import Curriculum +from animalai_train.trainers.exception import MetaCurriculumError import logging diff --git a/animalai/trainers/models.py b/examples/animalai_train/animalai_train/trainers/models.py similarity index 100% rename from animalai/trainers/models.py rename to examples/animalai_train/animalai_train/trainers/models.py diff --git a/animalai/trainers/policy.py b/examples/animalai_train/animalai_train/trainers/policy.py similarity index 98% rename from animalai/trainers/policy.py rename to examples/animalai_train/animalai_train/trainers/policy.py index ad18c501..dd23940a 100644 --- a/animalai/trainers/policy.py +++ b/examples/animalai_train/animalai_train/trainers/policy.py @@ -2,9 +2,9 @@ import numpy as np import tensorflow as tf -from animalai.trainers import UnityException +from animalai_train.trainers import UnityException from tensorflow.python.tools import freeze_graph -from animalai.trainers import tensorflow_to_barracuda as tf2bc +from animalai_train.trainers import tensorflow_to_barracuda as tf2bc logger = logging.getLogger("mlagents.trainers") diff --git a/animalai/trainers/ppo/__init__.py b/examples/animalai_train/animalai_train/trainers/ppo/__init__.py similarity index 100% rename from animalai/trainers/ppo/__init__.py rename to examples/animalai_train/animalai_train/trainers/ppo/__init__.py diff --git a/animalai/trainers/ppo/models.py b/examples/animalai_train/animalai_train/trainers/ppo/models.py similarity index 99% rename from animalai/trainers/ppo/models.py rename to examples/animalai_train/animalai_train/trainers/ppo/models.py index cb1546cd..519dfb50 100644 --- a/animalai/trainers/ppo/models.py +++ b/examples/animalai_train/animalai_train/trainers/ppo/models.py @@ -2,7 +2,7 @@ import numpy as np import tensorflow as tf -from animalai.trainers.models import LearningModel +from animalai_train.trainers.models import LearningModel logger = logging.getLogger("mlagents.envs") diff --git a/animalai/trainers/ppo/policy.py b/examples/animalai_train/animalai_train/trainers/ppo/policy.py similarity index 99% rename from animalai/trainers/ppo/policy.py rename to examples/animalai_train/animalai_train/trainers/ppo/policy.py index 925043e4..33bbba62 100644 --- a/animalai/trainers/ppo/policy.py +++ b/examples/animalai_train/animalai_train/trainers/ppo/policy.py @@ -1,8 +1,8 @@ import logging import numpy as np -from animalai.trainers.ppo.models import PPOModel -from animalai.trainers.policy import Policy +from animalai_train.trainers.ppo.models import PPOModel +from animalai_train.trainers.policy import Policy logger = logging.getLogger("mlagents.trainers") diff --git a/animalai/trainers/ppo/trainer.py b/examples/animalai_train/animalai_train/trainers/ppo/trainer.py similarity index 99% rename from animalai/trainers/ppo/trainer.py rename to examples/animalai_train/animalai_train/trainers/ppo/trainer.py index f5b4b422..817cd669 100644 --- a/animalai/trainers/ppo/trainer.py +++ b/examples/animalai_train/animalai_train/trainers/ppo/trainer.py @@ -10,9 +10,9 @@ import tensorflow as tf from animalai.envs import AllBrainInfo, BrainInfo -from animalai.trainers.buffer import Buffer -from animalai.trainers.ppo.policy import PPOPolicy -from animalai.trainers.trainer import Trainer +from animalai_train.trainers.buffer import Buffer +from animalai_train.trainers.ppo.policy import PPOPolicy +from animalai_train.trainers.trainer import Trainer logger = logging.getLogger("mlagents.trainers") diff --git a/animalai/trainers/tensorflow_to_barracuda.py b/examples/animalai_train/animalai_train/trainers/tensorflow_to_barracuda.py similarity index 99% rename from animalai/trainers/tensorflow_to_barracuda.py rename to examples/animalai_train/animalai_train/trainers/tensorflow_to_barracuda.py index f33f3402..a7942909 100644 --- a/animalai/trainers/tensorflow_to_barracuda.py +++ b/examples/animalai_train/animalai_train/trainers/tensorflow_to_barracuda.py @@ -5,8 +5,8 @@ import re #import barracuda #from barracuda import Struct -from animalai.trainers import barracuda -from animalai.trainers.barracuda import Struct +from animalai_train.trainers import barracuda +from animalai_train.trainers.barracuda import Struct from google.protobuf import descriptor from google.protobuf.json_format import MessageToJson diff --git a/animalai/trainers/trainer.py b/examples/animalai_train/animalai_train/trainers/trainer.py similarity index 100% rename from animalai/trainers/trainer.py rename to examples/animalai_train/animalai_train/trainers/trainer.py diff --git a/animalai/trainers/trainer_controller.py b/examples/animalai_train/animalai_train/trainers/trainer_controller.py similarity index 97% rename from animalai/trainers/trainer_controller.py rename to examples/animalai_train/animalai_train/trainers/trainer_controller.py index 813a0a1e..e71a3624 100644 --- a/animalai/trainers/trainer_controller.py +++ b/examples/animalai_train/animalai_train/trainers/trainer_controller.py @@ -16,10 +16,10 @@ from animalai.envs import BrainInfo from animalai.envs.exception import UnityEnvironmentException -from animalai.trainers.ppo.trainer import PPOTrainer -from animalai.trainers.bc.offline_trainer import OfflineBCTrainer -from animalai.trainers.bc.online_trainer import OnlineBCTrainer -from animalai.trainers.meta_curriculum import MetaCurriculum +from animalai_train.trainers.ppo.trainer import PPOTrainer +from animalai_train.trainers.bc.offline_trainer import OfflineBCTrainer +from animalai_train.trainers.bc.online_trainer import OnlineBCTrainer +from animalai_train.trainers.meta_curriculum import MetaCurriculum class TrainerController(object): @@ -183,7 +183,7 @@ def _reset_env(self, env): return env.reset(config=self.meta_curriculum.get_config()) else: if self.update_config: - return env.reset(arenas_configurations_input=self.config) + return env.reset(arenas_configurations=self.config) self.update_config = False else: return env.reset() diff --git a/examples/animalai_train/setup.py b/examples/animalai_train/setup.py new file mode 100644 index 00000000..b50ea730 --- /dev/null +++ b/examples/animalai_train/setup.py @@ -0,0 +1,36 @@ +from setuptools import setup + +setup( + name='animalai_train', + version='0.5.0', + description='Animal AI competition training library', + url='https://github.com/beyretb/AnimalAI-Olympics', + author='Benjamin Beyret', + author_email='bb1010@ic.ac.uk', + + classifiers=[ + 'Intended Audience :: Developers', + 'Topic :: Scientific/Engineering :: Artificial Intelligence', + 'License :: OSI Approved :: Apache Software License', + 'Programming Language :: Python :: 3.6' + ], + + packages=['animalai_train.trainers', 'animalai_train.trainers.bc', 'animalai_train.trainers.ppo', + 'animalai_train.dopamine'], # Required + zip_safe=False, + + install_requires=[ + 'animalai>=0.4.2', + 'dopamine-rl', + 'tensorflow==1.12', + 'matplotlib', + 'Pillow>=4.2.1,<=5.4.1', + 'numpy>=1.13.3,<=1.14.5', + 'protobuf>=3.6,<3.7', + 'grpcio>=1.11.0,<1.12.0', + 'pyyaml>=5.1', + 'atari-py', + 'jsonpickle>=1.2', + 'pypiwin32==223;platform_system=="Windows"'], + python_requires=">=3.5,<3.8", +) diff --git a/configs/allObjectsRandom.yaml b/examples/configs/allObjectsRandom.yaml similarity index 100% rename from configs/allObjectsRandom.yaml rename to examples/configs/allObjectsRandom.yaml diff --git a/configs/avoidance.yaml b/examples/configs/avoidance.yaml similarity index 100% rename from configs/avoidance.yaml rename to examples/configs/avoidance.yaml diff --git a/configs/exampleConfig.yaml b/examples/configs/exampleConfig.yaml similarity index 100% rename from configs/exampleConfig.yaml rename to examples/configs/exampleConfig.yaml diff --git a/configs/exampleTraining.yaml b/examples/configs/exampleTraining.yaml similarity index 100% rename from configs/exampleTraining.yaml rename to examples/configs/exampleTraining.yaml diff --git a/configs/justFood.yaml b/examples/configs/justFood.yaml similarity index 100% rename from configs/justFood.yaml rename to examples/configs/justFood.yaml diff --git a/configs/lightsOff.yaml b/examples/configs/lightsOff.yaml similarity index 100% rename from configs/lightsOff.yaml rename to examples/configs/lightsOff.yaml diff --git a/configs/movingFood.yaml b/examples/configs/movingFood.yaml similarity index 100% rename from configs/movingFood.yaml rename to examples/configs/movingFood.yaml diff --git a/configs/objectManipulation.yaml b/examples/configs/objectManipulation.yaml similarity index 100% rename from configs/objectManipulation.yaml rename to examples/configs/objectManipulation.yaml diff --git a/configs/obstacles.yaml b/examples/configs/obstacles.yaml similarity index 100% rename from configs/obstacles.yaml rename to examples/configs/obstacles.yaml diff --git a/configs/preferences.yaml b/examples/configs/preferences.yaml similarity index 100% rename from configs/preferences.yaml rename to examples/configs/preferences.yaml diff --git a/examples/configs/rainbow.gin b/examples/configs/rainbow.gin new file mode 100644 index 00000000..1cc5e979 --- /dev/null +++ b/examples/configs/rainbow.gin @@ -0,0 +1,34 @@ +# Hyperparameters follow Hessel et al. (2018). +import dopamine.agents.rainbow.rainbow_agent +import animalai_train.dopamine.animalai_lib +import dopamine.discrete_domains.run_experiment +import dopamine.replay_memory.prioritized_replay_buffer +import gin.tf.external_configurables + +RainbowAgent.num_atoms = 51 +RainbowAgent.vmax = 10. +RainbowAgent.gamma = 0.99 +RainbowAgent.update_horizon = 3 +RainbowAgent.min_replay_history = 20000 # agent steps +RainbowAgent.update_period = 4 +RainbowAgent.target_update_period = 8000 # agent steps +RainbowAgent.epsilon_train = 0.01 +RainbowAgent.epsilon_eval = 0.001 +RainbowAgent.epsilon_decay_period = 250000 # agent steps +RainbowAgent.replay_scheme = 'prioritized' +RainbowAgent.tf_device = '/gpu:0' # use '/cpu:*' for non-GPU version +RainbowAgent.optimizer = @tf.train.AdamOptimizer() +RainbowAgent.network = @animalai_lib.rainbow_network + +# Note these parameters are different from C51's. +tf.train.AdamOptimizer.learning_rate = 0.0000625 +tf.train.AdamOptimizer.epsilon = 0.00015 + +create_agent.agent_name = 'rainbow' +Runner.num_iterations = 200 +Runner.training_steps = 250000 # agent steps +Runner.evaluation_steps = 125000 # agent steps +Runner.max_steps_per_episode = 27000 # agent steps + +WrappedPrioritizedReplayBuffer.replay_capacity = 1000000 +WrappedPrioritizedReplayBuffer.batch_size = 32 diff --git a/configs/trainer_config.yaml b/examples/configs/trainer_config.yaml similarity index 100% rename from configs/trainer_config.yaml rename to examples/configs/trainer_config.yaml diff --git a/examples/trainDopamine.py b/examples/trainDopamine.py new file mode 100644 index 00000000..bf97a7c5 --- /dev/null +++ b/examples/trainDopamine.py @@ -0,0 +1,33 @@ +from animalai.envs.gym.environment import AnimalAIEnv +from animalai.envs.arena_config import ArenaConfig +from dopamine.agents.rainbow import rainbow_agent +from dopamine.discrete_domains import run_experiment + + +import random + +env_path = '../env/AnimalAI' +worker_id = random.randint(1, 100) +arena_config_in = ArenaConfig('configs/justFood.yaml') +base_dir = 'models/dopamine' +gin_files = ['configs/rainbow.gin'] + + +def create_env_fn(): + env = AnimalAIEnv(environment_filename=env_path, + worker_id=worker_id, + n_arenas=1, + arenas_configurations=arena_config_in, + retro=True) + return env + + +def create_agent_fn(sess, env, summary_writer): + return rainbow_agent.RainbowAgent(sess=sess, num_actions=env.action_space.n, summary_writer=summary_writer) + + +run_experiment.load_gin_configs(gin_files, None) +runner = run_experiment.Runner(base_dir=base_dir, + create_agent_fn=create_agent_fn, + create_environment_fn=create_env_fn) +runner.run_experiment() diff --git a/train.py b/examples/trainMLAgents.py similarity index 96% rename from train.py rename to examples/trainMLAgents.py index b291b582..38057b15 100644 --- a/train.py +++ b/examples/trainMLAgents.py @@ -1,4 +1,4 @@ -from animalai.trainers.trainer_controller import TrainerController +from animalai_train.trainers.trainer_controller import TrainerController from animalai.envs import UnityEnvironment from animalai.envs.exception import UnityEnvironmentException from animalai.envs.arena_config import ArenaConfig @@ -8,7 +8,7 @@ # ML-agents parameters for training -env_path = 'env/AnimalAI' +env_path = '../env/AnimalAI' worker_id = random.randint(1, 100) seed = 10 base_port = 5005 diff --git a/visualizeArena.py b/examples/visualizeArena.py similarity index 94% rename from visualizeArena.py rename to examples/visualizeArena.py index ddc7a5bd..09711631 100644 --- a/visualizeArena.py +++ b/examples/visualizeArena.py @@ -3,7 +3,7 @@ import sys import random -env_path = 'env/AnimalAI' +env_path = '../env/AnimalAI' worker_id = random.randint(0, 200) run_seed = 1 docker_target_name = None @@ -41,7 +41,7 @@ def init_environment(env_path, docker_target_name, no_graphics, worker_id, seed) # We can pass a different configuration at each env.reset() call. You can therefore load different YAML files between # episodes or directly amend the arena_config_in which contains a dictionary of configurations for all arenas. # See animalai/envs/arena_config.py for the syntax -env.reset(arenas_configurations_input =arena_config_in) +env.reset(arenas_configurations =arena_config_in) try: while True: diff --git a/visualizeLightsOff.py b/examples/visualizeLightsOff.py similarity index 92% rename from visualizeLightsOff.py rename to examples/visualizeLightsOff.py index 0d941c39..b1a55d49 100644 --- a/visualizeLightsOff.py +++ b/examples/visualizeLightsOff.py @@ -1,11 +1,11 @@ -from animalai.envs import UnityEnvironment +from animalai.envs.environment import UnityEnvironment from animalai.envs.arena_config import ArenaConfig import random import numpy as np from matplotlib import pyplot as plt from matplotlib import animation -env_path = 'env/AnimalAI' +env_path = '../env/AnimalAI' worker_id = random.randint(1, 100) seed = 10 @@ -35,7 +35,7 @@ ) arena_config_in = ArenaConfig('configs/lightsOff.yaml') -env.reset(arenas_configurations_input=arena_config_in) +env.reset(arenas_configurations=arena_config_in) fig, axes = plt.subplots(2, 2) imshows = [] for i in range(2): diff --git a/requirementsOthers.txt b/requirementsOthers.txt deleted file mode 100644 index 5aedba07..00000000 --- a/requirementsOthers.txt +++ /dev/null @@ -1,12 +0,0 @@ -tensorflow>=1.7,<1.8 -Pillow>=4.2.1 -matplotlib -numpy>=1.13.3,<=1.14.5 -jupyter -pytest>=3.2.2,<4.0.0 -docopt -pyyaml -jsonpickle -matplotlib -protobuf>=3.6,<3.7 -grpcio>=1.11.0,<1.12.0 diff --git a/requirementsWindows.txt b/requirementsWindows.txt deleted file mode 100644 index c48ba5d3..00000000 --- a/requirementsWindows.txt +++ /dev/null @@ -1,13 +0,0 @@ -tensorflow>=1.7,<1.8 -Pillow>=4.2.1 -matplotlib -numpy>=1.13.3,<=1.14.5 -jupyter -pytest>=3.2.2,<4.0.0 -docopt -pyyaml -jsonpickle -matplotlib -protobuf>=3.6,<3.7 -grpcio>=1.11.0,<1.12.0 -pypiwin32==223