diff --git a/docs/en/advanced_guides/models.md b/docs/en/advanced_guides/models.md
index 91361720eb..8202e95b7c 100644
--- a/docs/en/advanced_guides/models.md
+++ b/docs/en/advanced_guides/models.md
@@ -1 +1,179 @@
# Models
+
+# Models
+
+We usually define a neural network in a deep learning task as a model, and this model is the core of an algorithm. [MMEngine](https://github.com/open-mmlab/mmengine) abstracts a unified model [BaseModel](https://github.com/open-mmlab/mmengine/blob/main/mmengine/model/base_model/base_model.py#L16) to standardize the interfaces for training, testing and other processes. All models implemented by MMSegmentation inherit from `BaseModel`, and in MMSegmentation we implemented forward and added some functions for the semantic segmentation algorithm.
+
+## Common components
+
+### Segmentor
+
+In MMSegmentation, we abstract the network architecture as a **Segmentor**, it is a model that contains all components of a network. We have already implemented **EncoderDecoder** and **CascadeEncoderDecoder**, which typically consist of **Data preprocessor**, **Backbone**, **Decode head** and **Auxiliary head**.
+
+### Data preprocessor
+
+**Data preprocessor** is the part that copies data to the target device and preprocesses the data into the model input format.
+
+### Backbone
+
+**Backbone** is the part that transforms an image to feature maps, such as a **ResNet-50** without the last fully connected layer.
+
+### Neck
+
+**Neck** is the part that connects the backbone and heads. It performs some refinements or reconfigurations on the raw feature maps produced by the backbone. An example is **Feature Pyramid Network (FPN)**.
+
+### Decode Head
+
+**Decode Head** is the part that transforms the feature maps into a segmentation mask, such as **PSPNet**.
+
+### Auxiliary head
+
+**Auxiliary head** is an optional component that transforms the feature maps into segmentation masks which only used for computing auxiliary losses.
+
+## Basic interfaces
+
+MMSegmentation wraps `BaseModel` and implements the [BaseSegmentor](https://github.com/open-mmlab/mmsegmentation/blob/1.x/mmseg/models/segmentors/base.py#L15) class, which mainly provides the interfaces `forward`, `train_step`, `val_step` and `test_step`. The following will introduce these interfaces in detail.
+
+### forward
+
+
+
+ EncoderDecoder dataflow
+
+
+
+
+ CascadeEncoderDecoder dataflow
+
+
+The `forward` method returns losses or predictions of training, validation, testing, and a simple inference process.
+
+The method should accept three modes: "tensor", "predict" and "loss":
+
+- "tensor": Forward the whole network and return the tensor or tuple of tensor without any post-processing, same as a common `nn.Module`.
+- "predict": Forward and return the predictions, which are fully processed to a list of `SegDataSample`.
+- "loss": Forward and return a `dict` of losses according to the given inputs and data samples.
+
+**Note:** [SegDataSample](https://github.com/open-mmlab/mmsegmentation/blob/1.x/mmseg/structures/seg_data_sample.py) is a data structure interface of MMSegmentation, it is used as an interface between different components. `SegDataSample` implements the abstract data element `mmengine.structures.BaseDataElement`, please refer to [the SegDataSample documentation](https://mmsegmentation.readthedocs.io/en/1.x/advanced_guides/structures.html) and [data element documentation](https://mmengine.readthedocs.io/en/latest/advanced_tutorials/data_element.html) in [MMEngine](https://github.com/open-mmlab/mmengine) for more information.
+
+Note that this method doesn't handle either backpropagation or optimizer updating, which are done in the method `train_step`.
+
+Parameters:
+
+- inputs (torch.Tensor) - The input tensor with shape (N, C, ...) in general.
+- data_sample (list\[[SegDataSample](https://github.com/open-mmlab/mmsegmentation/blob/1.x/mmseg/structures/seg_data_sample.py)\]) - The seg data samples. It usually includes information such as `metainfo` and `gt_sem_seg`. Default to None.
+- mode (str) - Return what kind of value. Defaults to 'tensor'.
+
+Returns:
+
+- `dict` or `list`:
+ - If `mode == "loss"`, return a `dict` of loss tensor used for backward and logging.
+ - If `mode == "predict"`, return a `list` of `SegDataSample`, the inference results will be incrementally added to the `data_sample` parameter passed to the forward method, each `SegDataSample` contains the following keys:
+ - pred_sem_seg (`PixelData`): Prediction of semantic segmentation.
+ - seg_logits (`PixelData`): Predicted logits of semantic segmentation before normalization.
+ - If `mode == "tensor"`, return a `tensor` or `tuple of tensor` or `dict` of `tensor` for custom use.
+
+### prediction modes
+
+We briefly describe the fields of the model's configuration in [the config documentation](../user_guides/1_config.md), here we elaborate on the `model.test_cfg` field. `model.test_cfg` is used to control forward behavior, the `forward` method in `"predict"` mode can run in two modes:
+
+- `whole_inference`: If `cfg.model.test_cfg.mode == 'whole'`, model will inference with full images.
+
+ An `whole_inference` mode example config:
+
+ ```python
+ model = dict(
+ type='EncoderDecoder'
+ ...
+ test_cfg=dict(mode='whole')
+ )
+ ```
+
+- `slide_inference`: If `cfg.model.test_cfg.mode == 'slide'`, model will inference by sliding-window. **Note:** if you select the `slide` mode, `cfg.model.test_cfg.stride` and `cfg.model.test_cfg.crop_size` should also be specified.
+
+ An `slide_inference` mode example config:
+
+ ```python
+ model = dict(
+ type='EncoderDecoder'
+ ...
+ test_cfg=dict(mode='slide', crop_size=256, stride=170)
+ )
+ ```
+
+### train_step
+
+The `train_step` method calls the forward interface of the `loss` mode to get the loss `dict`. The `BaseModel` class implements the default model training process including preprocessing, model forward propagation, loss calculation, optimization, and back-propagation.
+
+Parameters:
+
+- data (dict or tuple or list) - Data sampled from the dataset. In MMSegmentation, the data dict contains `inputs` and `data_samples` two fields.
+- optim_wrapper (OptimWrapper) - OptimWrapper instance used to update model parameters.
+
+**Note:** [OptimWrapper](https://github.com/open-mmlab/mmengine/blob/main/mmengine/optim/optimizer/optimizer_wrapper.py#L17) provides a common interface for updating parameters, please refer to optimizer wrapper [documentation](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/optim_wrapper.html) in [MMEngine](https://github.com/open-mmlab/mmengine) for more information.
+
+Returns:
+
+- Dict\[str, `torch.Tensor`\]: A `dict` of tensor for logging.
+
+
+
+ train_step dataflow
+
+
+### val_step
+
+The `val_step` method calls the forward interface of the `predict` mode and returns the prediction result, which is further passed to the process interface of the evaluator and the `after_val_iter` interface of the Hook.
+
+Parameters:
+
+- data (`dict` or `tuple` or `list`) - Data sampled from the dataset. In MMSegmentation, the data dict contains `inputs` and `data_samples` two fields.
+
+Returns:
+
+- `list` - The predictions of given data.
+
+
+
+ test_step/val_step dataflow
+
+
+### test_step
+
+The `BaseModel` implements `test_step` the same as `val_step`.
+
+## Data Preprocessor
+
+The [SegDataPreProcessor](https://github.com/open-mmlab/mmsegmentation/blob/1.x/mmseg/models/data_preprocessor.py#L13) implemented by MMSegmentation inherits from the [BaseDataPreprocessor](https://github.com/open-mmlab/mmengine/blob/main/mmengine/model/base_model/data_preprocessor.py#L18) implemented by [MMEngine](https://github.com/open-mmlab/mmengine) and provides the functions of data preprocessing and copying data to the target device.
+
+The runner carries the model to the specified device during the construction stage, while the data is carried to the specified device by the [SegDataPreProcessor](https://github.com/open-mmlab/mmsegmentation/blob/1.x/mmseg/models/data_preprocessor.py#L13) in `train_step`, `val_step`, and `test_step`, and the processed data is further passed to the model.
+
+The parameters of the `SegDataPreProcessor` constructor:
+
+- mean (Sequence\[Number\], optional) - The pixel mean of R, G, B channels. Defaults to None.
+- std (Sequence\[Number\], optional) - The pixel standard deviation of R, G, B channels. Defaults to None.
+- size (tuple, optional) - Fixed padding size.
+- size_divisor (int, optional) - The divisor of padded size.
+- pad_val (float, optional) - Padding value. Default: 0.
+- seg_pad_val (float, optional) - Padding value of segmentation map. Default: 255.
+- bgr_to_rgb (bool) - whether to convert image from BGR to RGB. Defaults to False.
+- rgb_to_bgr (bool) - whether to convert image from RGB to RGB. Defaults to False.
+- batch_augments (list\[dict\], optional) - Batch-level augmentations. Default to None.
+
+The data will be processed as follows:
+
+- Collate and move data to the target device.
+- Pad inputs to the input size with defined `pad_val`, and pad seg map with defined `seg_pad_val`.
+- Stack inputs to batch_inputs.
+- Convert inputs from bgr to rgb if the shape of input is (3, H, W).
+- Normalize image with defined std and mean.
+- Do batch augmentations like Mixup and Cutmix during training.
+
+The parameters of the `forward` method:
+
+- data (dict) - data sampled from dataloader.
+- training (bool) - Whether to enable training time augmentation.
+
+The returns of the `forward` method:
+
+- Dict: Data in the same format as the model input.
diff --git a/docs/en/user_guides/1_config.md b/docs/en/user_guides/1_config.md
index 86baa9705a..a5b16304e0 100644
--- a/docs/en/user_guides/1_config.md
+++ b/docs/en/user_guides/1_config.md
@@ -112,7 +112,7 @@ model = dict(
loss_weight=0.4)), # Loss weight of auxiliary_head.
# model training and testing settings
train_cfg=dict(), # train_cfg is just a place holder for now.
- test_cfg=dict(mode='whole')) # The test mode, options are 'whole' and 'sliding'. 'whole': whole image fully-convolutional test. 'sliding': sliding crop window on the image.
+ test_cfg=dict(mode='whole')) # The test mode, options are 'whole' and 'slide'. 'whole': whole image fully-convolutional test. 'slide': sliding crop window on the image.
```
`_base_/datasets/cityscapes.py` is the configuration file of the dataset
diff --git a/resources/cascade_encoder_decoder_dataflow.png b/resources/cascade_encoder_decoder_dataflow.png
new file mode 100644
index 0000000000..28e33d0527
Binary files /dev/null and b/resources/cascade_encoder_decoder_dataflow.png differ
diff --git a/resources/encoder_decoder_dataflow.png b/resources/encoder_decoder_dataflow.png
new file mode 100644
index 0000000000..33a8a49163
Binary files /dev/null and b/resources/encoder_decoder_dataflow.png differ
diff --git a/resources/test_step.png b/resources/test_step.png
new file mode 100644
index 0000000000..4d52351b85
Binary files /dev/null and b/resources/test_step.png differ
diff --git a/resources/train_step.png b/resources/train_step.png
new file mode 100644
index 0000000000..1e06105a06
Binary files /dev/null and b/resources/train_step.png differ