Skip to content

Commit

Permalink
Update README.md (#586)
Browse files Browse the repository at this point in the history
* Updated compatibility info.
* Moved TonY jar publishing info to wiki https://github.com/tony-framework/TonY/wiki/Publishing-TonY-to-Maven-(for-admins)
* Moved virtualenv based usage in front of container based setup.
  • Loading branch information
Keqiu Hu authored Aug 16, 2021
1 parent 4f7ead3 commit 0d6857f
Showing 1 changed file with 57 additions and 83 deletions.
140 changes: 57 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,10 @@ machine learning jobs reliably and flexibly. For a quick overview of TonY and co

## Compatibility Notes

It is recommended to run TonY with [Hadoop 3.1.1](https://hadoop.apache.org/old/releases.html#8+Aug+2018%3A+Release+3.1.1+available) and above. TonY itself is compatible with [Hadoop 2.7.4](https://hadoop.apache.org/docs/r2.7.4/) and above. If you need GPU isolation from TonY, you need [Hadoop 3.1.0](https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/) or higher.
TonY itself is compatible with [Hadoop 2.6.0](https://hadoop.apache.org/docs/r2.6.0/) (CDH5.11.0) and above. If you need GPU isolation from TonY, you need [Hadoop 2.10](https://hadoop.apache.org/docs/r2.10.0/) or higher for Hadoop 2, or [Hadoop 3.1.0](https://hortonworks.com/blog/gpus-support-in-apache-hadoop-3-1-yarn-hdp-3/) or higher for Hadoop 3.

## Build

### How to build
TonY is built using [Gradle](https://github.com/gradle/gradle). To build TonY, run:

./gradlew build
Expand All @@ -28,89 +27,11 @@ This will automatically run tests, if want to build without running tests, run:

The jar required to run TonY will be located in `./tony-cli/build/libs/`.

## Publishing (for admins)

Follow [this guide](https://blog.sonatype.com/2010/01/how-to-generate-pgp-signatures-with-maven/) to generate a key pair using GPG. Publish your public key.

Create a Nexus account at https://oss.sonatype.org/ and request access to publish to com.linkedin.tony. Here's an example Jira ticket: https://issues.sonatype.org/browse/OSSRH-47350.

Configure your `~/.gradle/gradle.properties` file:

```
# signing plugin uses these
signing.keyId=...
signing.secretKeyRingFile=/home/<ldap>/.gnupg/secring.gpg
signing.password=...
# maven repo credentials
mavenUser=...
mavenPassword=...
# gradle-nexus-staging-plugin uses these
nexusUsername=<sameAsMavenUser>
nexusPassword=<sameAsMavenPassword>
```

Now you can publish and release artifacts by running `./gradlew publish closeAndReleaseRepository`.

## Usage

TonY is a Java library, so it is as simple as running a Java program. There are two ways to launch your deep learning jobs with TonY:
- Use Docker container.
There are two ways to launch your deep learning jobs with TonY:
- Use a zipped Python virtual environment.

### Use a Docker container
Note that this requires you have a properly configured Hadoop cluster with Docker support. Check this [documentation](https://hadoop.apache.org/docs/r2.9.1/hadoop-yarn/hadoop-yarn-site/DockerContainers.html) if you are unsure how to set it up. Assuming you have properly set up your Hadoop cluster with Docker container runtime, you should have already built a proper Docker image with required Hadoop configurations. The next thing you need is to install your Python dependencies inside your Docker image - TensorFlow or PyTorch.

Below is a folder structure of what you need to launch the job:

MyJob/
> src/
> models/
mnist_distributed.py
tony.xml
tony-cli-0.1.5-all.jar

The `src/` folder would contain all your training script. The `tony.xml` is used to config your training job. Specifically for using Docker as the container runtime, your configuration should be similar to something below:

$ cat MyJob/tony.xml
<configuration>
<property>
<name>tony.worker.instances</name>
<value>4</value>
</property>
<property>
<name>tony.worker.memory</name>
<value>4g</value>
</property>
<property>
<name>tony.worker.gpus</name>
<value>1</value>
</property>
<property>
<name>tony.ps.memory</name>
<value>3g</value>
</property>
<property>
<name>tony.docker.enabled</name>
<value>true</value>
</property>
<property>
<name>tony.docker.containers.image</name>
<value>YOUR_DOCKER_IMAGE_NAME</value>
</property>
</configuration>

For a full list of configurations, please see the [wiki](https://github.com/linkedin/TonY/wiki/TonY-Configurations).

Now you're ready to launch your job:

$ java -cp "`hadoop classpath --glob`:MyJob/*:MyJob/" \
com.linkedin.tony.cli.ClusterSubmitter \
-executes models/mnist_distributed.py \
-task_params '--input_dir /path/to/hdfs/input --output_dir /path/to/hdfs/output' \
-src_dir src \
-python_binary_path /home/user_name/python_virtual_env/bin/python
- Use Docker container.

### Use a zipped Python virtual environment

Expand Down Expand Up @@ -141,7 +62,7 @@ As you know, nothing comes for free. If you don't want to bother setting your cl
> models/
mnist_distributed.py
tony.xml
tony-cli-0.1.5-all.jar
tony-cli-0.4.7-all.jar
my-venv.zip # The additional file you need.

A similar `tony.xml` but without Docker related configurations:
Expand Down Expand Up @@ -176,6 +97,59 @@ Then you can launch your job:
-python_binary_path Python/bin/python \ # relative path to the Python binary inside the my-venv.zip
-src_dir src

### Use a Docker container
Note that this requires you have a properly configured Hadoop cluster with Docker support. Check this [documentation](https://hadoop.apache.org/docs/r2.9.1/hadoop-yarn/hadoop-yarn-site/DockerContainers.html) if you are unsure how to set it up. Assuming you have properly set up your Hadoop cluster with Docker container runtime, you should have already built a proper Docker image with required Hadoop configurations. The next thing you need is to install your Python dependencies inside your Docker image - TensorFlow or PyTorch.

Below is a folder structure of what you need to launch the job:

MyJob/
> src/
> models/
mnist_distributed.py
tony.xml
tony-cli-0.4.7-all.jar

The `src/` folder would contain all your training script. The `tony.xml` is used to config your training job. Specifically for using Docker as the container runtime, your configuration should be similar to something below:

$ cat MyJob/tony.xml
<configuration>
<property>
<name>tony.worker.instances</name>
<value>4</value>
</property>
<property>
<name>tony.worker.memory</name>
<value>4g</value>
</property>
<property>
<name>tony.worker.gpus</name>
<value>1</value>
</property>
<property>
<name>tony.ps.memory</name>
<value>3g</value>
</property>
<property>
<name>tony.docker.enabled</name>
<value>true</value>
</property>
<property>
<name>tony.docker.containers.image</name>
<value>YOUR_DOCKER_IMAGE_NAME</value>
</property>
</configuration>

For a full list of configurations, please see the [wiki](https://github.com/linkedin/TonY/wiki/TonY-Configurations).

Now you're ready to launch your job:

$ java -cp "`hadoop classpath --glob`:MyJob/*:MyJob/" \
com.linkedin.tony.cli.ClusterSubmitter \
-executes models/mnist_distributed.py \
-task_params '--input_dir /path/to/hdfs/input --output_dir /path/to/hdfs/output' \
-src_dir src \
-python_binary_path /home/user_name/python_virtual_env/bin/python

## TonY arguments
The command line arguments are as follows:

Expand Down

0 comments on commit 0d6857f

Please sign in to comment.