From a01155904920cc9512d945ffe991bae6c300bb3a Mon Sep 17 00:00:00 2001
From: Yury Kashnitsky <kashnitsky@google.com>
Date: Mon, 19 Aug 2024 18:36:53 +0200
Subject: [PATCH] add ToC to each lecture

---
 .../topic01/topic01_pandas_data_analysis.md   |  5 +-
 ...02_additional_seaborn_matplotlib_plotly.md |  7 +-
 .../topic02/topic02_visual_data_analysis.md   | 15 +---
 .../topic03/topic03_decision_trees_kNN.md     |  9 +--
 ...dels_part1_mse_likelihood_bias_variance.md |  8 +--
 ..._models_part2_logit_likelihood_learning.md |  8 +--
 ...opic4_linear_models_part3_regul_example.md |  5 ++
 ..._part4_good_bad_logit_movie_reviews_XOR.md |  7 +-
 ...near_models_part5_valid_learning_curves.md |  6 ++
 .../book/topic05/topic5_part1_bagging.md      |  7 +-
 .../topic05/topic5_part2_random_forest.md     | 12 +---
 .../topic5_part3_feature_importance.md        |  7 +-
 ...6_feature_engineering_feature_selection.md |  6 ++
 .../book/topic07/topic7_pca_clustering.md     | 14 +---
 .../topic08_sgd_hashing_vowpal_wabbit.md      | 69 +++++++------------
 .../topic9_part1_time_series_python.md        | 29 ++------
 .../topic09/topic9_part2_facebook_prophet.md  | 14 +---
 .../book/topic10/topic10_gradient_boosting.md | 15 +---
 18 files changed, 78 insertions(+), 165 deletions(-)

diff --git a/mlcourse_ai_jupyter_book/book/topic01/topic01_pandas_data_analysis.md b/mlcourse_ai_jupyter_book/book/topic01/topic01_pandas_data_analysis.md
index 104ba6d84..672cd56c8 100644
--- a/mlcourse_ai_jupyter_book/book/topic01/topic01_pandas_data_analysis.md
+++ b/mlcourse_ai_jupyter_book/book/topic01/topic01_pandas_data_analysis.md
@@ -29,9 +29,8 @@ _Source: Getty Images_
 
 ## Article outline
 
-1. [Demonstration of the main Pandas methods](#1-demonstration-of-the-main-pandas-methods)
-2. [First attempt at predicting telecom churn](#2-first-attempt-at-predicting-telecom-churn)
-3. [Useful resources](#3-useful-resources)
+```{contents}
+```
 
 ## 1. Demonstration of the main Pandas methods
 
diff --git a/mlcourse_ai_jupyter_book/book/topic02/topic02_additional_seaborn_matplotlib_plotly.md b/mlcourse_ai_jupyter_book/book/topic02/topic02_additional_seaborn_matplotlib_plotly.md
index 823781e70..9d088c712 100644
--- a/mlcourse_ai_jupyter_book/book/topic02/topic02_additional_seaborn_matplotlib_plotly.md
+++ b/mlcourse_ai_jupyter_book/book/topic02/topic02_additional_seaborn_matplotlib_plotly.md
@@ -27,11 +27,8 @@ Author: [Egor Polusmak](https://www.linkedin.com/in/egor-polusmak/). Translated
 
 ## Article outline
 
-1. [Dataset](1-dataset)
-2. [DataFrame.plot()](2-dataframe-plot)
-3. [Seaborn](3-seaborn)
-4. [Plotly](4-plotly)
-5. [Useful resources](5-useful-resources)
+```{contents}
+```
 
 ## 1. Dataset
 
diff --git a/mlcourse_ai_jupyter_book/book/topic02/topic02_visual_data_analysis.md b/mlcourse_ai_jupyter_book/book/topic02/topic02_visual_data_analysis.md
index 720e589ee..d4bebf2ba 100644
--- a/mlcourse_ai_jupyter_book/book/topic02/topic02_visual_data_analysis.md
+++ b/mlcourse_ai_jupyter_book/book/topic02/topic02_visual_data_analysis.md
@@ -33,19 +33,8 @@ In this article, we are going to get hands-on experience with visual exploration
 
 ## Article outline
 
-1. [Dataset](1-dataset)
-2. [Univariate visualization](2-univariate-visualization)
-    * 2.1 [Quantitative features](21-quantitative-features)
-    * 2.2 [Categorical and binary features](22-categorical-and-binary-features)
-3. [Multivariate visualization](3-multivariate-visualization)
-    * 3.1 [Quantitative vs. Quantitative](31-quantitative-vs-quantitative)
-    * 3.2 [Quantitative vs. Categorical](32-quantitative-vs-categorical)
-    * 3.3 [Categorical vs. Categorical](33-categorical-vs-categorical)
-4. [Whole dataset visualizations](4-whole-dataset-visualizations)
-    * 4.1 [Naive approach](41-a-naive-approach)
-    * 4.2 [Dimensionality reduction](42-dimensionality-reduction)
-    * 4.3 [t-SNE](43-t-SNE)
-5. [Useful resources](5-useful-resources)
+```{contents}
+```
 
 ## 1. Dataset
 
diff --git a/mlcourse_ai_jupyter_book/book/topic03/topic03_decision_trees_kNN.md b/mlcourse_ai_jupyter_book/book/topic03/topic03_decision_trees_kNN.md
index a7f600d92..cbc2d8a66 100644
--- a/mlcourse_ai_jupyter_book/book/topic03/topic03_decision_trees_kNN.md
+++ b/mlcourse_ai_jupyter_book/book/topic03/topic03_decision_trees_kNN.md
@@ -22,13 +22,8 @@ Author: [Yury Kashnitsky](https://yorko.github.io). Translated and edited by [Ch
 
 ## Article outline
 
-1. [Introduction](introduction)
-2. [Decision Tree](decision-tree)
-3. [Nearest Neighbors Method](nearest-neighbors-nethod)
-4. [Choosing Model Parameters and Cross-Validation](choosing-model-parameters-and-cross-validation)
-5. [Application Examples and Complex Cases](application-examples-and-complex-cases)
-6. [Pros and Cons of Decision Trees and the Nearest Neighbors Method](pros-and-cons-of-decision-trees-and-the-nearest-neighbors-method)
-7. [Useful resources](useful-resources)
+```{contents}
+```
 
 ## 1. Introduction
 
diff --git a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part1_mse_likelihood_bias_variance.md b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part1_mse_likelihood_bias_variance.md
index fab105d21..8cbf90f0a 100644
--- a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part1_mse_likelihood_bias_variance.md
+++ b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part1_mse_likelihood_bias_variance.md
@@ -24,11 +24,9 @@ Author: [Pavel Nesterov](http://pavelnesterov.info/). Translated and edited by [
 
 
 ## Article outline
-1. [Introduction](introduction)
-2. [Maximum Likelihood Estimation](maximum-likelihood-estimation)
-3. [Bias-Variance Decomposition](bias-variance-decomposition)
-4. [Regularization of Linear Regression](regularization-of-linear-regression)
-5. [Useful resources](useful-resources)
+
+```{contents}
+```
 
 
 ## 1. Introduction
diff --git a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part2_logit_likelihood_learning.md b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part2_logit_likelihood_learning.md
index b3b9cb077..39a034d3d 100644
--- a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part2_logit_likelihood_learning.md
+++ b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part2_logit_likelihood_learning.md
@@ -25,11 +25,9 @@ kernelspec:
 Author: [Yury Kashnitsky](https://yorko.github.io). Translated and edited by [Christina Butsko](https://www.linkedin.com/in/christinabutsko/), [Nerses Bagiyan](https://www.linkedin.com/in/nersesbagiyan/), [Yulia Klimushina](https://www.linkedin.com/in/yuliya-klimushina-7168a9139), and [Yuanyuan Pao](https://www.linkedin.com/in/yuanyuanpao/). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.
 
 ## Article outline
-1. [Linear Classifier](linear-classifier)
-2. [Logistic Regression as a Linear Classifier](logistic-regression-as-a-linear-classifier)
-3. [Maximum Likelihood Estimation and Logistic Regression](maximum-likelihood-estimation-and-logistic-regression)
-4. [$L_2$-Regularization of Logistic Loss](l-2-regularization-of-logistic-loss)
-5. [Useful resources](useful-resources)
+
+```{contents}
+```
 
 ## 1. Linear Classifier
 
diff --git a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part3_regul_example.md b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part3_regul_example.md
index 41eb70de8..0cd531a99 100644
--- a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part3_regul_example.md
+++ b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part3_regul_example.md
@@ -24,6 +24,11 @@ kernelspec:
 
 Author: [Yury Kashnitsky](https://yorko.github.io). Translated and edited by [Christina Butsko](https://www.linkedin.com/in/christinabutsko/), [Nerses Bagiyan](https://www.linkedin.com/in/nersesbagiyan/), [Yulia Klimushina](https://www.linkedin.com/in/yuliya-klimushina-7168a9139), and [Yuanyuan Pao](https://www.linkedin.com/in/yuanyuanpao/). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.
 
+## Article outline
+
+```{contents}
+```
+
 In the first article, we demonstrated how polynomial features allow linear models to build nonlinear separating surfaces. Let's now show this visually.
 
 Let's see how regularization affects the quality of classification on a dataset on microchip testing from Andrew Ng's course on machine learning. We will use logistic regression with polynomial features and vary the regularization parameter $C$. First, we will see how regularization affects the separating border of the classifier and intuitively recognize under- and overfitting. Then, we will choose the regularization parameter to be numerically close to the optimal value via (`cross-validation`) and (`GridSearch`).
diff --git a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part4_good_bad_logit_movie_reviews_XOR.md b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part4_good_bad_logit_movie_reviews_XOR.md
index e045fa615..79f0bbbe5 100644
--- a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part4_good_bad_logit_movie_reviews_XOR.md
+++ b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part4_good_bad_logit_movie_reviews_XOR.md
@@ -25,10 +25,9 @@ Author: [Yury Kashnitsky](https://yorko.github.io). Translated and edited by [Ch
 
 
 ## Article outline
-1. [Analysis of IMDB movie reviews](analysis-of-imdb-movie-reviews)
-2. [A Simple Word Count](a-simple-word-count)
-3. [The XOR Problem](the-xor-problem)
-4. [Useful resources](useful-resources)
+
+```{contents}
+```
 
 ## 1. Analysis of IMDB movie reviews
 
diff --git a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part5_valid_learning_curves.md b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part5_valid_learning_curves.md
index 20ac226d9..74c74fd33 100644
--- a/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part5_valid_learning_curves.md
+++ b/mlcourse_ai_jupyter_book/book/topic04/topic4_linear_models_part5_valid_learning_curves.md
@@ -24,6 +24,12 @@ kernelspec:
 Author: [Yury Kashnitsky](https://yorko.github.io). Translated and edited by [Christina Butsko](https://www.linkedin.com/in/christinabutsko/), [Nerses Bagiyan](https://www.linkedin.com/in/nersesbagiyan/), [Yulia Klimushina](https://www.linkedin.com/in/yuliya-klimushina-7168a9139), and [Yuanyuan Pao](https://www.linkedin.com/in/yuanyuanpao/). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.
 
 
+## Article outline
+
+```{contents}
+```
+
+
 ```{code-cell} ipython3
 import warnings
 import numpy as np
diff --git a/mlcourse_ai_jupyter_book/book/topic05/topic5_part1_bagging.md b/mlcourse_ai_jupyter_book/book/topic05/topic5_part1_bagging.md
index 27feb567c..e7f7c826f 100644
--- a/mlcourse_ai_jupyter_book/book/topic05/topic5_part1_bagging.md
+++ b/mlcourse_ai_jupyter_book/book/topic05/topic5_part1_bagging.md
@@ -23,11 +23,8 @@ Authors: [Vitaliy Radchenko](https://www.linkedin.com/in/vitaliyradchenk0/), and
 
 ## Article outline
 
-1. [Ensembles](ensembles)
-2. [Bootstrapping](bootstrapping)
-3. [Bagging](bagging)
-4. [Out-of-bag error](out-of-bag-error)
-5. [Useful resources](useful-resources)
+```{contents}
+```
 
 $\DeclareMathOperator{\Var}{Var}$
 $\DeclareMathOperator{\Cov}{Cov}$
diff --git a/mlcourse_ai_jupyter_book/book/topic05/topic5_part2_random_forest.md b/mlcourse_ai_jupyter_book/book/topic05/topic5_part2_random_forest.md
index 1ae88b1a9..7daa0933d 100644
--- a/mlcourse_ai_jupyter_book/book/topic05/topic5_part2_random_forest.md
+++ b/mlcourse_ai_jupyter_book/book/topic05/topic5_part2_random_forest.md
@@ -24,16 +24,8 @@ Authors: [Vitaliy Radchenko](https://www.linkedin.com/in/vitaliyradchenk0/), and
 
 ## Article outline
 
-1. [Algorithm](algorithm)
-2. [Comparison with Decision Trees and Bagging](comparison-with-decision-trees-and-bagging)
-3. [Parameters](parameters)
-4. [Variance and Decorrelation](variance-and-decorrelation)
-5. [Bias](bias)
-6. [Extremely Randomized Trees](extremely-randomized-trees)
-7. [Similarities between Random Forest and k-Nearest Neighbors](similarities-between-random-forest-and-k-nearest-neighbors)
-8. [Transformation of a dataset into a high-dimensional representation](transformation-of-a-dataset-into-a-high-dimensional-representation)
-9. [Pros and cons of random forests](pros-and-cons-of-random-forests)
-10. [Useful resources](useful-resources)
+```{contents}
+```
 
 $\DeclareMathOperator{\Var}{Var}$
 $\DeclareMathOperator{\Cov}{Cov}$
diff --git a/mlcourse_ai_jupyter_book/book/topic05/topic5_part3_feature_importance.md b/mlcourse_ai_jupyter_book/book/topic05/topic5_part3_feature_importance.md
index 70d38544d..445397f62 100644
--- a/mlcourse_ai_jupyter_book/book/topic05/topic5_part3_feature_importance.md
+++ b/mlcourse_ai_jupyter_book/book/topic05/topic5_part3_feature_importance.md
@@ -23,11 +23,8 @@ Authors: [Vitaliy Radchenko](https://www.linkedin.com/in/vitaliyradchenk0/), [Yu
 
 ## Article outline
 
-1. [Intuition](intuition)
-2. [Illustrating permutation importance](illustrating-permutation-importance)
-3. [Sklearn Random Forest Feature Importance](sklearn-random-forest-feature-importance)
-4. [Practical example](practical-example)
-5. [Useful resources](useful-resources)
+```{contents}
+```
 
 It's quite often that you want to make out the exact reasons of the algorithm outputting a particular answer. Or at the very least to find out which input features contributed most to the result. With Random Forest, you can obtain such information quite easily.
 
diff --git a/mlcourse_ai_jupyter_book/book/topic06/topic6_feature_engineering_feature_selection.md b/mlcourse_ai_jupyter_book/book/topic06/topic6_feature_engineering_feature_selection.md
index 1761a37d7..fdf587127 100644
--- a/mlcourse_ai_jupyter_book/book/topic06/topic6_feature_engineering_feature_selection.md
+++ b/mlcourse_ai_jupyter_book/book/topic06/topic6_feature_engineering_feature_selection.md
@@ -20,6 +20,12 @@ kernelspec:
 
 Author: [Arseny Kravchenko](https://arseny.info/pages/about_me.html#about_me). Translated and edited by [Christina Butsko](https://www.linkedin.com/in/christinabutsko/), [Yury Kashnitsky](https://yorko.github.io/), [Egor Polusmak](https://www.linkedin.com/in/egor-polusmak/), [Anastasia Manokhina](https://www.linkedin.com/in/anastasiiamanokhina/), [Anna Larionova](https://www.linkedin.com/in/anna-larionova-74434689/), [Evgeny Sushko](https://www.linkedin.com/in/evgenysushko/) and [Yuanyuan Pao](https://www.linkedin.com/in/yuanyuanpao/). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.
 
+
+## Article outline
+
+```{contents}
+```
+
 In this course, we have already seen several key machine learning algorithms. However, before moving on to the more fancy ones, we’d like to take a small detour and talk about data preparation. The well-known concept of “garbage in — garbage out” applies 100% to any task in machine learning. Any experienced professional can recall numerous times when a simple model trained on high-quality data was proven to be better than a complicated multi-model ensemble built on data that wasn’t clean.
 
 To start, I wanted to review three similar but different tasks:
diff --git a/mlcourse_ai_jupyter_book/book/topic07/topic7_pca_clustering.md b/mlcourse_ai_jupyter_book/book/topic07/topic7_pca_clustering.md
index 8fdc56a81..ce2e09d2c 100644
--- a/mlcourse_ai_jupyter_book/book/topic07/topic7_pca_clustering.md
+++ b/mlcourse_ai_jupyter_book/book/topic07/topic7_pca_clustering.md
@@ -28,17 +28,9 @@ In this lesson, we will work with unsupervised learning methods such as Principa
 
 
 ## Article outline
-1. [Introduction](introduction)
-2. [Principal Component Analysis (PCA)](principal-component-analysis-pca)
- - [Intuition, theories, and application issues](intuition-theories-and-application-issues)
- - [Examples](examples)
-3. [Clustering](clustering)
- - [K-means](k-means)
- - [Affinity Propagation](affinity-propagation)
- - [Spectral clustering](spectral-clustering)
- - [Agglomerative clustering](agglomerative-clustering)
- - [Accuracy metrics](accuracy-metrics)
-4. [Useful links](useful-links)
+
+```{contents}
+```
 
 ## 1. Introduction
 
diff --git a/mlcourse_ai_jupyter_book/book/topic08/topic08_sgd_hashing_vowpal_wabbit.md b/mlcourse_ai_jupyter_book/book/topic08/topic08_sgd_hashing_vowpal_wabbit.md
index 5fee9eb01..794a95a85 100644
--- a/mlcourse_ai_jupyter_book/book/topic08/topic08_sgd_hashing_vowpal_wabbit.md
+++ b/mlcourse_ai_jupyter_book/book/topic08/topic08_sgd_hashing_vowpal_wabbit.md
@@ -24,20 +24,9 @@ Author: [Yury Kashnitsky](https://yorko.github.io). Translated and edited by [Se
 This week, we'll cover two reasons for Vowpal Wabbit’s exceptional training speed, namely, online learning and hashing trick, in both theory and practice. We will try it out with news, movie reviews, and StackOverflow questions.
 
 ## Article outline
-1. [Stochastic gradient descent and online learning](stochastic-gradient-descent-and-online-learning)
-    - 1.1. [SGD](stochastic-gradient-descent)
-    - 1.2. [Online approach to learning](online-approach-to-learning)
-2. [Categorical feature processing](categorical-feature-processing)
-    - 2.1. [Label Encoding](label-encoding)
-    - 2.2. [One-Hot Encoding](one-hot-encoding)
-    - 2.3. [Hashing trick](hashing-trick)
-3. [Vowpal Wabbit](vowpal-Wabbit)
-    - 3.1. [News. Binary classification](news-binary-classification)
-    - 3.2. [News. Multiclass classification](news-multiclass-classification)
-    - 3.3. [IMDB movie reviews](imdb-movie-reviews)
-    - 3.4. [Classifying gigabytes of StackOverflow questions](classifying-gigabytes-of-stackoverflow-questionss)
-4. [Useful resources](useful-resources)
 
+```{contents}
+```
 
 
 ```{code-cell} ipython3
@@ -61,7 +50,9 @@ import seaborn as sns
 ```
 
 ## 1. Stochastic gradient descent and online learning
+(stochastic-gradient-descent-and-online-learning)=
 ###  1.1. Stochastic gradient descent
+(stochastic-gradient-descent)=
 
 Despite the fact that gradient descent is one of the first things learned in machine learning and optimization courses, it is one of its modifications, Stochastic Gradient Descent (SGD), that is hard to top.
 
@@ -145,6 +136,7 @@ Andrew Ng has a good illustration of this in his [machine learning course](https
 These are the contour plots for some function, and we want to find the global minimum of this function. The red curve shows weight changes (in this picture, $\theta_0$ and $\theta_1$ correspond to our $w_0$ and $w_1$). According to the properties of a gradient, the direction of change at every point is orthogonal to contour plots. With stochastic gradient descent, weights are changing in a less predictable manner, and it even may seem that some steps are wrong by leading away from minima; however, both procedures converge to the same solution.
 
 ### 1.2. Online approach to learning
+(online-approach-to-learning)=
 Stochastic gradient descent gives us practical guidance for training both classifiers and regressors with large amounts of data up to hundreds of GBs (depending on computational resources).
 
 Considering the case of paired regression, we can store the training data set $(X,y)$ in HDD without loading it into RAM (where it simply won't fit), read objects one by one, and update the weights of our model:
@@ -350,7 +342,7 @@ Shell is the main interface for VW.
 
 
 ```{code-cell} ipython3
-#!vw --help | head
+!vw --help | head
 ```
 
 Vowpal Wabbit reads data from files or from standard input stream (stdin) with the following format:
@@ -462,18 +454,14 @@ Now, we pass the created training file to Vowpal Wabbit. We solve the classifica
 
 
 ```
-#!vw -d $PATH_TO_WRITE_DATA/20news_train.vw \
-# --loss_function hinge -f $PATH_TO_WRITE_DATA/20news_model.vw
+!vw -d $PATH_TO_WRITE_DATA/20news_train.vw --loss_function hinge -f $PATH_TO_WRITE_DATA/20news_model.vw
 ```
 
 VW prints a lot of interesting info while training (one can suppress it with the `--quiet` parameter). You can see [documentation](https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/cmd_linear_regression.html#vowpal-wabbit-output) of the diagnostic output. Note how average loss drops while training. For loss computation, VW uses samples it has never seen before, so this measure is usually accurate. Now, we apply our trained model to the test set, saving predictions into a file with the `-p` flag:  
 
-
 ```
-#!vw -i $PATH_TO_WRITE_DATA/20news_model.vw -t -d $PATH_TO_WRITE_DATA/20news_test.vw \
-# -p $PATH_TO_WRITE_DATA/20news_test_predictions.txt
-```
-
+!vw -i $PATH_TO_WRITE_DATA/20news_model.vw -t -d $PATH_TO_WRITE_DATA/20news_test.vw -p $PATH_TO_WRITE_DATA/20news_test_predictions.txt
+```
 
 Now we load our predictions, compute AUC, and plot the ROC curve:
 
@@ -500,7 +488,6 @@ The AUC value we get shows that we have achieved high classification quality.
 
 We will use the same news dataset, but, this time, we will solve a multiclass classification problem. `Vowpal Wabbit` is a little picky – it wants labels starting from 1 till K, where K – is the number of classes in the classification task (20 in our case). So we will use LabelEncoder and add 1 afterwards (recall that `LabelEncoder` maps labels into range from 0 to K-1).
 
-
 ```{code-cell} ipython3
 all_documents = newsgroups["data"]
 topic_encoder = LabelEncoder()
@@ -531,19 +518,14 @@ We train Vowpal Wabbit in multiclass classification mode, passing the `oaa` para
 
 Additionally, we can try automatic Vowpal Wabbit parameter tuning with [Hyperopt](https://github.com/hyperopt/hyperopt).
 
-
 ```
-#!vw --oaa 20 $PATH_TO_WRITE_DATA/20news_train_mult.vw -f $PATH_TO_WRITE_DATA/ \
-#20news_model_mult.vw --loss_function=hinge
+!vw --oaa 20 $PATH_TO_WRITE_DATA/20news_train_mult.vw -f $PATH_TO_WRITE_DATA/20news_model_mult.vw --loss_function=hinge
 ```
 
 ```
-#%%time
-#!vw -i $PATH_TO_WRITE_DATA/20news_model_mult.vw -t -d $PATH_TO_WRITE_DATA/20news_test_mult.vw \
-#-p $PATH_TO_WRITE_DATA/20news_test_predictions_mult.txt
+!vw -i $PATH_TO_WRITE_DATA/20news_model_mult.vw -t -d $PATH_TO_WRITE_DATA/20news_test_mult.vw -p $PATH_TO_WRITE_DATA/20news_test_predictions_mult.txt
 ```
 
-
 ```{code-cell} ipython3
 with open(
     os.path.join(PATH_TO_WRITE_DATA, "20news_test_predictions_mult.txt")
@@ -551,13 +533,10 @@ with open(
     test_prediction_mult = [float(label) for label in pred_file.readlines()]
 ```
 
-
 ```{code-cell} ipython3
 accuracy_score(test_labels_mult, test_prediction_mult)
 ```
 
-
-
 Here is how often the model misclassifies atheism with other topics:
 
 
@@ -720,8 +699,8 @@ with open(os.path.join(PATH_TO_WRITE_DATA, "movie_reviews_test.vw"), "w") as vw_
 
 
 ```{code-cell} ipython3
-#!vw -d $PATH_TO_WRITE_DATA/movie_reviews_train.vw --loss_function hinge \
-#-f $PATH_TO_WRITE_DATA/movie_reviews_model.vw --quiet
+!vw -d $PATH_TO_WRITE_DATA/movie_reviews_train.vw --loss_function hinge \
+-f $PATH_TO_WRITE_DATA/movie_reviews_model.vw --quiet
 ```
 
 Next, make the hold-out prediction with the following VW arguments:
@@ -732,8 +711,8 @@ Next, make the hold-out prediction with the following VW arguments:
 
 
 ```{code-cell} ipython3
-#!vw -i $PATH_TO_WRITE_DATA/movie_reviews_model.vw -t \
-#-d $PATH_TO_WRITE_DATA/movie_reviews_valid.vw -p $PATH_TO_WRITE_DATA/movie_valid_pred.txt --quiet
+!vw -i $PATH_TO_WRITE_DATA/movie_reviews_model.vw -t \
+-d $PATH_TO_WRITE_DATA/movie_reviews_valid.vw -p $PATH_TO_WRITE_DATA/movie_valid_pred.txt --quiet
 ```
 
 Read the predictions from the text file and estimate the accuracy and ROC AUC. Note that VW prints probability estimates of the +1 class. These estimates are distributed from  -1 to 1, so we can convert these into binary answers, assuming that positive values belong to class 1.
@@ -759,9 +738,9 @@ Again, do the same for the test set.
 
 
 ```{code-cell} ipython3
-#!vw -i $PATH_TO_WRITE_DATA/movie_reviews_model.vw -t \
-#-d $PATH_TO_WRITE_DATA/movie_reviews_test.vw \
-#-p $PATH_TO_WRITE_DATA/movie_test_pred.txt --quiet
+!vw -i $PATH_TO_WRITE_DATA/movie_reviews_model.vw -t \
+-d $PATH_TO_WRITE_DATA/movie_reviews_test.vw \
+-p $PATH_TO_WRITE_DATA/movie_test_pred.txt --quiet
 ```
 
 
@@ -787,14 +766,14 @@ Let's try to achieve a higher accuracy by incorporating bigrams.
 
 
 ```{code-cell} ipython3
-#!vw -d $PATH_TO_WRITE_DATA/movie_reviews_train.vw \
-# --loss_function hinge --ngram 2 -f $PATH_TO_WRITE_DATA/movie_reviews_model2.vw --quiet
+!vw -d $PATH_TO_WRITE_DATA/movie_reviews_train.vw \
+ --loss_function hinge --ngram 2 -f $PATH_TO_WRITE_DATA/movie_reviews_model2.vw --quiet
 ```
 
 
 ```{code-cell} ipython3
-#!vw -i$PATH_TO_WRITE_DATA/movie_reviews_model2.vw -t -d $PATH_TO_WRITE_DATA/movie_reviews_valid.vw \
-#-p $PATH_TO_WRITE_DATA/movie_valid_pred2.txt --quiet
+!vw -i$PATH_TO_WRITE_DATA/movie_reviews_model2.vw -t -d $PATH_TO_WRITE_DATA/movie_reviews_valid.vw \
+-p $PATH_TO_WRITE_DATA/movie_valid_pred2.txt --quiet
 ```
 
 
@@ -817,8 +796,8 @@ print("AUC: {}".format(round(roc_auc_score(valid_labels, valid_prediction), 3)))
 
 
 ```{code-cell} ipython3
-#!vw -i $PATH_TO_WRITE_DATA/movie_reviews_model2.vw -t -d $PATH_TO_WRITE_DATA/movie_reviews_test.vw \
-#-p $PATH_TO_WRITE_DATA/movie_test_pred2.txt --quiet
+!vw -i $PATH_TO_WRITE_DATA/movie_reviews_model2.vw -t -d $PATH_TO_WRITE_DATA/movie_reviews_test.vw \
+-p $PATH_TO_WRITE_DATA/movie_test_pred2.txt --quiet
 ```
 
 
diff --git a/mlcourse_ai_jupyter_book/book/topic09/topic9_part1_time_series_python.md b/mlcourse_ai_jupyter_book/book/topic09/topic9_part1_time_series_python.md
index e2695f915..422bbbcdb 100644
--- a/mlcourse_ai_jupyter_book/book/topic09/topic9_part1_time_series_python.md
+++ b/mlcourse_ai_jupyter_book/book/topic09/topic9_part1_time_series_python.md
@@ -26,25 +26,10 @@ We continue our open machine learning course with a new article on time series.
 
 Let's take a look at how to work with time series in Python: what methods and models we can use for prediction, what double and triple exponential smoothing is, what to do if stationarity is not your favorite thing, how to build SARIMA and stay alive, how to make predictions using xgboost... In addition, all of this will be applied to (harsh) real world examples.
 
-# Article outline
-1. [Introduction](introduction)
-   - [Forecast quality metrics](forecast-quality-metrics)
-2. [Move, smoothe, evaluate](move-smoothe-evaluate)
-   - Rolling window estimations
-   - Exponential smoothing, Holt-Winters model
-   - Time-series cross validation, parameters selection
-3. [Econometric approach](econometric-approach)
-   - Stationarity, unit root
-   - Getting rid of non-stationarity
-   - SARIMA intuition and model building
-4. [Linear (and not only) models for time series](linear-and-not-only-models-for-time-series)
-   - [Feature extraction](feature-extraction)
-   - [Time series lags](time-series-lags)
-   - [Target encoding](target-encoding)
-   - [Regularization and feature selection](regularization-and-feature-selection)
-   - [Boosting](boosting)
-5. [Conclusion](conclusion)
-6. [Useful resources](useful-resources)
+## Article outline
+
+```{contents}
+```
 
 In my day-to-day job, I encounter time-series related tasks almost every day. The most frequent questions asked are the following: what will happen with our metrics in the next day/week/month/etc., how many users will install our app, how much time will they spend online, how many actions will users complete, and so on. We can approach these prediction tasks using different methods depending on the required quality of the prediction, length of the forecast period, and, of course, the time within which we have to choose features and tune parameters to achieve desired results.
 
@@ -727,7 +712,7 @@ def plotHoltWinters(series, plot_intervals=False, plot_anomalies=False):
     plt.title("Mean Absolute Percentage Error: {0:.2f}%".format(error))
 
     if plot_anomalies:
-        anomalies = np.array([np.NaN] * len(series))
+        anomalies = np.array([np.nan] * len(series))
         anomalies[series.values < model.LowerBond[: len(series)]] = series.values[
             series.values < model.LowerBond[: len(series)]
         ]
@@ -1110,7 +1095,7 @@ def plotSARIMA(series, model, n_steps):
     data["arima_model"] = model.fittedvalues
     # making a shift on s+d steps, because these values were unobserved by the model
     # due to the differentiating
-    data["arima_model"][: s + d] = np.NaN
+    data["arima_model"][: s + d] = np.nan
 
     # forecasting on n_steps forward
     forecast = model.predict(start=data.shape[0], end=data.shape[0] + n_steps)
@@ -1260,7 +1245,7 @@ def plotModelResults(
         plt.plot(upper, "r--", alpha=0.5)
 
         if plot_anomalies:
-            anomalies = np.array([np.NaN] * len(y_test))
+            anomalies = np.array([np.nan] * len(y_test))
             anomalies[y_test < lower] = y_test[y_test < lower]
             anomalies[y_test > upper] = y_test[y_test > upper]
             plt.plot(anomalies, "o", markersize=10, label="Anomalies")
diff --git a/mlcourse_ai_jupyter_book/book/topic09/topic9_part2_facebook_prophet.md b/mlcourse_ai_jupyter_book/book/topic09/topic9_part2_facebook_prophet.md
index 43406164d..633948043 100644
--- a/mlcourse_ai_jupyter_book/book/topic09/topic9_part2_facebook_prophet.md
+++ b/mlcourse_ai_jupyter_book/book/topic09/topic9_part2_facebook_prophet.md
@@ -32,18 +32,8 @@ In this article, we will look at [Prophet](https://facebook.github.io/prophet/),
 
 ## Article outline
 
-1. Introduction
-2. [The Prophet Forecasting Model](the-prophet-forecasting-model)
-3. [Practice with Prophet](practice-with-facebook-prophet)
-    * 3.1 Installation in Python
-    * 3.2 Dataset
-    * 3.3 Exploratory visual analysis
-    * 3.4 Making a forecast
-    * 3.5 Forecast quality evaluation
-    * 3.6 Visualization
-4. [Box-Cox Transformation](box-cox-transformation)
-5. [Summary](summary)
-6. [References](references)
+```{contents}
+```
 
 ## 1. Introduction
 
diff --git a/mlcourse_ai_jupyter_book/book/topic10/topic10_gradient_boosting.md b/mlcourse_ai_jupyter_book/book/topic10/topic10_gradient_boosting.md
index 1d785653d..2a92b6b1f 100644
--- a/mlcourse_ai_jupyter_book/book/topic10/topic10_gradient_boosting.md
+++ b/mlcourse_ai_jupyter_book/book/topic10/topic10_gradient_boosting.md
@@ -29,19 +29,8 @@ Today we are going to have a look at one of the most popular and practical machi
 ## Article outline
 We recommend going over this article in the order described below, but feel free to jump around between sections.  
 
-1. [Introduction and history of boosting](introduction-and-history-of-boosting)
-   - [History of Gradient Boosting Machine](history-of-gbm)
-1. [GBM algorithm](gbm-algorithm)
-   - [ML Problem statement](ml-problem-statement)
-   - [Functional gradient descent](functional-gradient-descent)
-   - [Friedman's classic GBM algorithm](friedmans-classic-gbm-algorithm)
-   - [Step-by-step example of the GBM algorithm](step-by-step-example-how-gbm-works)
-1. [Loss functions](loss-functions)
-   - [Regression loss functions](regression-loss-functions)
-   - [Classification loss functions](classification-loss-functions)
-   - [Weights](weights)
-1. [Conclusion](4conclusion)
-1. [Useful resources](useful-resources)
+```{contents}
+```
 
 ## 1.  Introduction and history of boosting
 Almost everyone in machine learning has heard about gradient boosting. Many data scientists include this algorithm in their data scientist's toolbox because of the good results it yields on any given (unknown) problem.