diff --git a/README.md b/README.md
index f9b018d8..98401c5b 100644
--- a/README.md
+++ b/README.md
@@ -139,9 +139,19 @@ Example from SMAP test set:
Example from MSL test set (note that one anomaly segment is not detected):
+## Model Overview
+
+
+
+Figure above adapted from [Zhao et al. (2020)](https://arxiv.org/pdf/2009.02040.pdf)
+
+1. The raw input data is preprocessed, and then a 1-D convolution is applied in the temporal dimension in order to smooth the data and alleviate possible noise effects.
+2. The output of the 1-D convolution module is processed by two parallel graph attention layer, one feature-oriented and one time-oriented, in order to capture dependencies among features and timestamps, respectively.
+3. The output from the 1-D convolution module and the two GAT modules are concatenated and fed to a GRU layer, to capture longer sequential patterns.
+4. The output from the GRU layer are fed into a forecasting model and a reconstruction model, to get a prediction for the next timestamp, as well as a reconstruction of the input sequence.
## GAT layers
-
+Below we visualize how the two GAT layers view the input as a complete graph.
Feature-Oriented GAT layer | Time-Oriented GAT layer
--- | ---
|