Skip to content

Commit

Permalink
Accelerator and custom loop typos.
Browse files Browse the repository at this point in the history
  • Loading branch information
jonthegeek committed Apr 28, 2022
1 parent 22036ac commit 2088ab5
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 35 deletions.
21 changes: 7 additions & 14 deletions vignettes/accelerator.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,16 @@ knitr::opts_chunk$set(
library(luz)
```

The Accelerator API is a simplified port of the Hugging Face [Accelerate library](https://github.com/huggingface/accelerate). Currently it only handles CPU and
single GPU usage but allows users avoid the boilerplate code necessary to write
training loops that works correctly on both devices.
The Accelerator API is a simplified port of the Hugging Face [Accelerate library](https://github.com/huggingface/accelerate).
It allows users to avoid the boilerplate code necessary to write training loops that work correctly on both devices.
Currently it only handles CPU and single-GPU usage.

This API is meant to be the most flexible way you can use the luz package. With
the Accelerator API, you write the raw torch training loop and with a few code
changes you handle device placement of model, optimizers and dataloaders so you
don't need to add many `$to(device="cuda")` in your code or think about the order
to create model and optimizers.
This API is meant to be the most flexible way you can use the luz package.
With the Accelerator API, you write the raw torch training loop and, with a few code changes, you automatically handle device placement of the model, optimizers and dataloaders, so you don't need to add many `$to(device="cuda")` calls in your code or think about the order in which to create the model and optimizers.

## Example

The Accelerator API is best explained by showing an example diff in a raw torch
training loop.
The Accelerator API is best explained by showing an example diff in a raw torch training loop.

```diff
library(torch)
Expand Down Expand Up @@ -68,9 +64,6 @@ coro::loop(for (batch in dl) {
})
```

With the following changes to your code you no longer need to manually move
data and parameters between devices which makes your code easier to read and
less error prone.
With the code changes shown, you no longer need to manually move data and parameters between devices, which makes your code easier to read and less error prone.

You can find additional documentation using `help(accelerator)`.

48 changes: 27 additions & 21 deletions vignettes/custom-loop.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,20 @@ library(luz)

Luz is a higher level API for torch that is designed to be highly flexible by providing a layered API that allows it to be useful no matter the level of control your need for your training loop.

In the getting started vignette we have seen the basics of luz and how to quickly modify parts of the training loop using callbacks and custom metrics. In this document we will find describe how luz allows the user to get fine grained control of the training loop.
In the getting started vignette we have seen the basics of luz and how to quickly modify parts of the training loop using callbacks and custom metrics.
In this document we will describe how luz allows the user to get fine-grained control of the training loop.

A part from the use of callbacks there are three more ways that you can use luz depending on how much control you need:
Apart from the use of callbacks, there are three more ways that you can use luz (depending on how much control you need):

- **Multiple optimizers or losses**: You might be optimizing two loss functions each with its own optimizer, but you still don't want to modify the `backward()` - `zero_grad()` and `step()` calls. This is common in models like GANs (Generative Adversarial Networks) when you have competing neural networks trained with different losses and optimizers.
- **Multiple optimizers or losses:** You might be optimizing two loss functions each with its own optimizer, but you still don't want to modify the `backward()` - `zero_grad()` and `step()` calls. This is common in models like GANs (Generative Adversarial Networks) when you have competing neural networks trained with different losses and optimizers.

- **Fully flexible step:** You might want to be in control of how to call `backward()`, `zero_grad()`and `step()` as well as maybe having more control of gradient computation. For example, you might want to use 'virtual batch sizes', ie. you accumulate the gradients for a few steps before updating the weights.
- **Fully flexible steps:** You might want to be in control of how to call `backward()`, `zero_grad()`and `step()`. You might also want to have more control of gradient computation. For example, you might want to use 'virtual batch sizes', where you accumulate the gradients for a few steps before updating the weights.

- **Completely flexible loop**: Your training loop can be anything you want but you still want to use luz to handle device placement of the dataloaders, optimizers and models. See the accelerator vignette.
- **Completely flexible loops:** Your training loop can be anything you want but you still want to use luz to handle device placement of the dataloaders, optimizers and models. See `vignette("accelerator")`.

Let's consider a simplified version of the `net` that we implemented in the getting started vignette:

``` {.r}
```{r}
net <- nn_module(
"Net",
initialize = function() {
Expand All @@ -52,7 +53,7 @@ net <- nn_module(

Using the highest level of luz API we would fit it using:

``` {.r}
```{r}
fitted <- net %>%
setup(
loss = nn_cross_entropy_loss(),
Expand All @@ -66,9 +67,10 @@ fitted <- net %>%

## Multiple optimizers

Suppose we want to do an experiment where we train the first fully connected layer using a learning rate of 0.1 and the second one using learning rate of 0.01. Both minimizing the same `nn_cross_entropy_loss()` but for the first layer we want to add L1 regularization on the weights.
Suppose we want to do an experiment where we train the first fully connected layer using a learning rate of 0.1 and the second one using a learning rate of 0.01.
We will minimize the same `nn_cross_entropy_loss()` for both, but for the first layer we want to add L1 regularization on the weights.

In order to use luz for this we will implement two methods in the `net` module:
In order to use luz for this, we will implement two methods in the `net` module:

- `set_optimizers`: returns a named list of optimizers depending on the `ctx`.

Expand Down Expand Up @@ -106,25 +108,29 @@ net <- nn_module(
)
```

Notice that model optimizers will be initialized according to the `set_optimizers()` method return value. In this case, we are initializing the optimizers using different model parameters and learning rates.
Notice that the model optimizers will be initialized according to the `set_optimizers()` method's return value (a list).
In this case, we are initializing the optimizers using different model parameters and learning rates.

The `loss()` method is responsible for computing the loss that will be then backpropagated to compute gradients and update the weights. This `loss()` method can access the `ctx` object that will contain a `opt_name` field, describing which optimizer is currently being used. Note that this function will be called once for each optimizer for each training and validation step. See `help("ctx")` for complete information about the context object.
The `loss()` method is responsible for computing the loss that will then be back-propagated to compute gradients and update the weights.
This `loss()` method can access the `ctx` object that will contain an `opt_name` field, describing which optimizer is currently being used.
Note that this function will be called once for each optimizer for each training and validation step.
See `help("ctx")` for complete information about the context object.

We can finally `setup` and `fit` this module, however we no longer need to specify optimizers and loss functions.

```{r}
fitted <- net %>%
setup(metrics = list(
luz_metric_accuracy
)) %>%
setup(metrics = list(luz_metric_accuracy)) %>%
fit(train_dl, epochs = 10, valid_data = test_dl)
```

Now let's re-implement this same model using the slightly more flexible approach of consisting in overriding the training and validation step.
Now let's re-implement this same model using the slightly more flexible approach of overriding the training and validation step.

## **Fully flexible step**
## Fully flexible step

Instead of implementing the `loss()` method we can implement the `step()` method, this allows us to flexibly modify what happens when training and validating for each batch in the dataset. You are now responsible for updating the weights by stepping the optimizers and backpropagating the loss.
Instead of implementing the `loss()` method, we can implement the `step()` method.
This allows us to flexibly modify what happens when training and validating for each batch in the dataset.
You are now responsible for updating the weights by stepping the optimizers and back-propagating the loss.

```{r}
net <- nn_module(
Expand Down Expand Up @@ -173,18 +179,18 @@ net <- nn_module(

The important things to notice here are:

- The `step()` method is used for both training and validation. You need to be careful only modify the weights when training. Again, you can get complete information regarding the context object using `help("ctx")`.
- The `step()` method is used for both training and validation. You need to be careful to only modify the weights when training. Again, you can get complete information regarding the context object using `help("ctx")`.

- `ctx$optimizers` is a named list holding each optimizer that was created when the `set_optimizers()` method was called.

- You need to manually track the losses by saving saving them in a named list in `ctx$loss`. By convention, we use the same name as the optimizer it refers to. It's good practice to `detach()` them before saving to reduce memory usage.
- You need to manually track the losses by saving saving them in a named list in `ctx$loss`. By convention, we use the same name as the optimizer it refers to. It is good practice to `detach()` them before saving to reduce memory usage.

- Callbacks that would be called inside the default `step()` method like `on_train_batch_after_pred`, `on_train_batch_after_loss` , etc won't be automatically called. You can still cal them manually by adding `ctx$call_callbacks("<callback name>")` inside you training step. See the code for `fit_one_batch()` and `valid_one_batch` to find all the callbacks that won't be called.
- Callbacks that would be called inside the default `step()` method like `on_train_batch_after_pred`, `on_train_batch_after_loss`, etc, won't be automatically called. You can still cal them manually by adding `ctx$call_callbacks("<callback name>")` inside your training step. See the code for `fit_one_batch()` and `valid_one_batch` to find all the callbacks that won't be called.

## Next steps

In this article you learned how to customize the `step()` of your training loop using luz layered functionality.

Luz also allows more flexible modifications of the training loop described in the Accelerator vignette.
Luz also allows more flexible modifications of the training loop described in the Accelerator vignette (`vignette("accelerator")`).

You should now be able to follow the examples marked with the 'intermediate' and 'advanced' category in the [examples gallery](https://mlverse.github.io/luz/articles/examples/index.html).

0 comments on commit 2088ab5

Please sign in to comment.