diff --git a/g3doc/_book.yaml b/g3doc/_book.yaml index 55260607..10dd1834 100644 --- a/g3doc/_book.yaml +++ b/g3doc/_book.yaml @@ -22,6 +22,8 @@ upper_tabs: path: /cloud/tutorials/distributed_training_nasnet_with_tensorflow_cloud - title: Hyperparameter tuning on Google Cloud path: /cloud/tutorials/hp_tuning_cifar10_using_google_cloud + - title: Running vision models from TF Model Garden on GCP with TF Cloud + path: /cloud/tutorials/experimental/running_vision_models_from_tf_model_garden_on_gcp_with_tf_cloud - heading: "Guides" - title: Cloud `run` guide path: /cloud/guides/run_guide diff --git a/g3doc/tutorials/experimental/running_vision_models_from_tf_model_garden_on_gcp_with_tf_cloud.ipynb b/g3doc/tutorials/experimental/running_vision_models_from_tf_model_garden_on_gcp_with_tf_cloud.ipynb new file mode 100644 index 00000000..914ad5f8 --- /dev/null +++ b/g3doc/tutorials/experimental/running_vision_models_from_tf_model_garden_on_gcp_with_tf_cloud.ipynb @@ -0,0 +1,341 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Running vision models from TF Model Garden on GCP with TF Cloud", + "provenance": [], + "collapsed_sections": [], + "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "ApxORpbFShVP" + }, + "source": [ + "##### Copyright 2021 The TensorFlow Cloud Authors." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eR70XKMMmC8I", + "cellView": "form" + }, + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wKcTRRxsAmDl" + }, + "source": [ + "# Running vision models from TF Model Garden on GCP with TF Cloud\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FAUbwFuJB3bw" + }, + "source": [ + "In this example we will use [run_models](https://github.com/tensorflow/cloud/blob/690c3eee65dadee8af260a19341ff23f42f1f070/src/python/tensorflow_cloud/core/experimental/models.py#L30) from the experimental module of TF Cloud to train a ResNet model from [TF Model Garden](https://github.com/tensorflow/models/tree/master/official) on [imagenette from TFDS](https://www.tensorflow.org/datasets/catalog/imagenette)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EFCSAVDbC8-W" + }, + "source": [ + "## Install Packages\n", + "\n", + "We need the nightly version of tensorflow-cloud that we can get from github, the official release of tf-models-official, and keras 2.6.0rc0 for compatibility." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "r4sSs1azu-Ti" + }, + "source": [ + "!pip install -q 'git+https://github.com/tensorflow/cloud.git@refs/pull/352/head#egg=tensorflow-cloud&subdirectory=src/python' tf-models-official keras==2.6.0rc0" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N3NC5vrDslsf" + }, + "source": [ + "## Import required modules" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sdkgm_6PvHkk", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c17384b4-07f1-493c-edf8-5eadde79524f" + }, + "source": [ + "import os\n", + "import sys\n", + "\n", + "import tensorflow_cloud as tfc\n", + "from tensorflow_cloud.core.experimental.models import run_models\n", + "\n", + "print(tfc.__version__)" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0.1.17.dev\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ka6MHtF-tTU1" + }, + "source": [ + "## Project Configurations\n", + "Setting project parameters. For more details on Google Cloud Specific parameters please refer to [Google Cloud Project Setup Instructions](https://www.kaggle.com/nitric/google-cloud-project-setup-instructions/)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OFPPSLF9vx4H" + }, + "source": [ + "# Set Google Cloud Specific parameters\n", + "\n", + "# TODO: Please set GCP_PROJECT_ID to your own Google Cloud project ID.\n", + "GCP_PROJECT_ID = 'YOUR_PROJECT_ID' #@param {type:\"string\"}\n", + "\n", + "# TODO: set GCS_BUCKET to your own Google Cloud Storage (GCS) bucket.\n", + "GCS_BUCKET = 'YOUR_GCS_BUCKET_NAME' #@param {type:\"string\"}\n", + "\n", + "# DO NOT CHANGE: Currently only the 'us-central1' region is supported.\n", + "REGION = 'us-central1'\n", + "\n", + "# OPTIONAL: You can change the job name to any string.\n", + "JOB_NAME = 'run_models_demo' #@param {type:\"string\"}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F1_shlH4tUM5" + }, + "source": [ + "## Authenticating the notebook to use your Google Cloud Project\n", + "\n", + "This code authenticates the notebook, checking your valid Google Cloud credentials and identity. It is inside the `if not tfc.remote()` block to ensure that it is only run in the notebook, and will not be run when the notebook code is sent to Google Cloud.\n", + "\n", + "Note: For Kaggle Notebooks click on \"Add-ons\"->\"Google Cloud SDK\" before running the cell below." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "EeW7IHBgtPJD" + }, + "source": [ + "if not tfc.remote():\n", + "\n", + " # Authentication for Kaggle Notebooks\n", + " if \"kaggle_secrets\" in sys.modules:\n", + " from kaggle_secrets import UserSecretsClient\n", + " UserSecretsClient().set_gcloud_credentials(project=GCP_PROJECT_ID)\n", + "\n", + " # Authentication for Colab Notebooks\n", + " if \"google.colab\" in sys.modules:\n", + " from google.colab import auth\n", + " auth.authenticate_user()\n", + " os.environ[\"GOOGLE_CLOUD_PROJECT\"] = GCP_PROJECT_ID" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EQrVntO2twh1" + }, + "source": [ + "## Set up TensorFlowCloud run\n", + "\n", + "Set up parameters for tfc.run(). The chief_config, worker_count and worker_config will be set up individually for each distribution strategy. For more details refer to [TensorFlow Cloud overview tutorial](https://colab.research.google.com/github/tensorflow/cloud/blob/master/g3doc/tutorials/overview.ipynb)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "o539iLTKv9a3" + }, + "source": [ + "with open('requirements.txt','w') as f:\n", + " f.write('git+https://github.com/tensorflow/cloud.git@refs/pull/352/head#egg=tensorflow-cloud&subdirectory=src/python\\n'+\n", + " 'tf-models-official\\n'+\n", + " 'keras==2.6.0rc0')\n", + "\n", + "run_kwargs = dict(\n", + " requirements_txt = 'requirements.txt',\n", + " docker_config=tfc.DockerConfig(\n", + " parent_image=\"gcr.io/deeplearning-platform-release/tf2-gpu.2-5\",\n", + " image_build_bucket=GCS_BUCKET\n", + " ),\n", + " chief_config=tfc.COMMON_MACHINE_CONFIGS[\"P100_4X\"],\n", + " job_labels={'job': JOB_NAME}\n", + ")" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hd4luG7nt3_0" + }, + "source": [ + "## Run the training using run_models" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_aVt71qpxHUe" + }, + "source": [ + "values = run_models(\n", + " 'imagenette',\n", + " 'resnet',\n", + " GCS_BUCKET,\n", + " 'train',\n", + " 'validation',\n", + " **run_kwargs,\n", + " )" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ku7oBH8iuc2X" + }, + "source": [ + "## Training Results\n", + "\n", + "### Reconnect your Colab instance\n", + "\n", + "Most remote training jobs are long running. If you are using Colab, it may time out before the training results are available.\n", + "\n", + "In that case, **rerun the following sections in order** to reconnect and configure your Colab instance to access the training results.\n", + "\n", + "1. Import required modules\n", + "2. Project Configurations\n", + "3. Authenticating the notebook to use your Google Cloud Project\n", + "\n", + "**DO NOT** rerun the rest of the code.\n", + "\n", + "### Load Tensorboard\n", + "While the training is in progress you can use Tensorboard to view the results. Note the results will show only after your training has started. This may take a few minutes." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rhVCh8x9upRY" + }, + "source": [ + "if not tfc.remote():\n", + " %load_ext tensorboard\n", + " tensorboard_logs_dir = values['tensorboard_logs']\n", + " %tensorboard --logdir $tensorboard_logs_dir" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kOU5Gu4Ku1Qc" + }, + "source": [ + "### Load your trained model\n", + "\n", + "Once training is complete, you can retrieve your model from the GCS Bucket you specified above." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rHoQnqKhu2Y8" + }, + "source": [ + "import tensorflow as tf\n", + "if not tfc.remote():\n", + " trained_model = tf.keras.models.load_model(values['saved_model'])\n", + " trained_model.summary()" + ], + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file