Skip to content

Commit

Permalink
Update README about sidecar tensorboard and dependency timeout mechan…
Browse files Browse the repository at this point in the history
…ism (#622)
  • Loading branch information
zuston authored Dec 1, 2021
1 parent 1b7cc36 commit d0bd0ca
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 2 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@ The command line arguments are as follows:
| shell_env | no | --shell_env LD_LIBRARY_PATH=/usr/local/lib64/ | Specifies key-value pairs for environment variables which will be set in your python worker/ps processes. |
| conf_file | no | --conf_file tony-local.xml | Location of a TonY configuration file, also support remote path, like `--conf_file hdfs://nameservice01/user/tony/tony-remote.xml` |
| conf | no | --conf tony.application.security.enabled=false | Override configurations from your configuration file via command line
| sidecar_tensorboard_log_dir | no | --sidecar_tensorboard_log_dir /hdfs/path/tensorboard_log_dir | HDFS path to tensorboard log dir, it will enable sidecar tensorboard managed by TonY. More detailed example refers to tony-examples/mnist_tensorflow module |

## TonY configurations

Expand Down Expand Up @@ -211,3 +212,7 @@ For more information about TonY, check out the following:
2. How do I configure arbitrary TensorFlow job types?
Please see the [wiki](https://github.com/linkedin/TonY/wiki/TonY-Configurations#task-configuration) on TensorFlow task configuration for details.
3. My tensorflow's partial workers hang when chief finished. Or evaluator hang when chief and workers finished.
Please see the [PR#521](https://github.com/tony-framework/TonY/pull/621) on Tensorflow configuration to solve it.
17 changes: 15 additions & 2 deletions tony-examples/mnist-tensorflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,12 @@ We have tested this example with 3 Workers (4 GB RAM + 1 vCPU) using MultiWorke
### Tensorboard Usage
TonY supports two modes(custom and sidecar) to start tensorboard.
1. [Custom] Allow users to start tensorboard in code, more details can be found in mnist_distributed.py example.
2. [Sidecar] Using the built-in tensorboard, it will start extra executor to running tensorboard by TonY. Only one thing to do is specify the log dir in tony xml, like as follows
2. [Sidecar] Using the built-in sidecar tensorboard, the extra tensorboard task executor will be managed by TonY.
The failure of sidecar tensorboard will not affect the entire training job.
Only one thing for user to do is to specify the log dir in tony xml or in tony cli, like as follows.
Tips: the conf priority in tony cli is prior to in tony xml.

tony.xml
```
<configuration>
....
Expand All @@ -123,4 +127,13 @@ TonY supports two modes(custom and sidecar) to start tensorboard.
<value>/tmp/xxxxxxx</value>
</property>
</configuration>
```
```
tony cli

$ java -cp "`hadoop classpath --glob`:MyJob/*:MyJob/" \
com.linkedin.tony.cli.ClusterSubmitter \
-executes models/mnist_distributed.py \
-task_params '--input_dir /path/to/hdfs/input --output_dir /path/to/hdfs/output' \
-src_dir src \
-python_binary_path /home/user_name/python_virtual_env/bin/python
-sidecar_tensorboard_log_dir /path/to/hdfs/tensorboard_log_dir

0 comments on commit d0bd0ca

Please sign in to comment.