Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
TheKidPadra authored May 5, 2023
1 parent a2f38a6 commit 64fee57
Show file tree
Hide file tree
Showing 14 changed files with 10,814 additions and 0 deletions.
1,173 changes: 1,173 additions & 0 deletions 4. Sequences, Time Serirs and Prediction/C4W1_Assignment.ipynb

Large diffs are not rendered by default.

766 changes: 766 additions & 0 deletions 4. Sequences, Time Serirs and Prediction/C4W2_Assignment.ipynb

Large diffs are not rendered by default.

957 changes: 957 additions & 0 deletions 4. Sequences, Time Serirs and Prediction/C4W3_Assignment.ipynb

Large diffs are not rendered by default.

1,017 changes: 1,017 additions & 0 deletions 4. Sequences, Time Serirs and Prediction/C4W4_Assignment.ipynb

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,360 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "gVWCXht3Wuhm"
},
"source": [
"<a href=\"https://colab.research.google.com/github/https-deeplearning-ai/tensorflow-1-public/blob/main/C4/W2/ungraded_labs/C4_W2_Lab_1_features_and_labels.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VwpmsHAGCThh"
},
"source": [
"# Ungraded Lab: Preparing Time Series Features and Labels\n",
"\n",
"In this lab, you will prepare time series data into features and labels that you can use to train a model. This is mainly achieved by a *windowing* technique where in you group consecutive measurement values into one feature and the next measurement will be the label. For example, in hourly measurements, you can use values taken at hours 1 to 11 to predict the value at hour 12. The next sections will show how you can implement this in Tensorflow. \n",
"\n",
"Let's begin!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8KGZ4YDEEA9s"
},
"source": [
"## Imports\n",
"\n",
"Tensorflow will be your lone import in this module and you'll be using methods mainly from the [tf.data API](https://www.tensorflow.org/guide/data), particularly the [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) class. This contains many useful methods to arrange sequences of data and you'll see that shortly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mBw-_CJVEDxY"
},
"outputs": [],
"source": [
"import tensorflow as tf"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xBUvK1ATDR2L"
},
"source": [
"## Create a Simple Dataset\n",
"\n",
"For this exercise, you will just use a sequence of numbers as your dataset so you can clearly see the effect of each command. For example, the cell below uses the [range()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#range) method to generate a dataset containing numbers 0 to 9."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "asEdslR_05O_"
},
"outputs": [],
"source": [
"# Generate a tf dataset with 10 elements (i.e. numbers 0 to 9)\n",
"dataset = tf.data.Dataset.range(10)\n",
"\n",
"# Preview the result\n",
"for val in dataset:\n",
" print(val.numpy())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7ci0BvcW0VM-"
},
"source": [
"You will see this command several times in the next sections."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "j3BpTbsvGbgn"
},
"source": [
"## Windowing the data\n",
"\n",
"As mentioned earlier, you want to group consecutive elements of your data and use that to predict a future value. This is called windowing and you can use that with the [window()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#window) method as shown below. Here, you will take 5 elements per window (i.e. `size` parameter) and you will move this window 1 element at a time (i.e. `shift` parameter). One caveat to using this method is that each window returned is a [Dataset](https://www.tensorflow.org/guide/data#dataset_structure) in itself. This is a Python iterable and, as of the current version (TF 2.8), it won't show the elements if you use the `print()` method on it. It will just show a description of the data structure (e.g. `<_VariantDataset shapes: (), types: tf.int64>`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Lrv_ghSt1lgQ"
},
"outputs": [],
"source": [
"# Generate a tf dataset with 10 elements (i.e. numbers 0 to 9)\n",
"dataset = tf.data.Dataset.range(10)\n",
"\n",
"# Window the data\n",
"dataset = dataset.window(size=5, shift=1)\n",
"\n",
"# Print the result\n",
"for window_dataset in dataset:\n",
" print(window_dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SfnpaHHVXu4f"
},
"source": [
"If you want to see the elements, you will have to iterate over each iterable. This can be done by modifying the print statement above with a nested for-loop or list comprehension. The code below shows the list comprehension while in the lecture video, you saw the for-loop."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vpL6Bsm7W0xx"
},
"outputs": [],
"source": [
"# Print the result\n",
"for window_dataset in dataset:\n",
" print([item.numpy() for item in window_dataset])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2U91p2SoIaTC"
},
"source": [
"Now that you can see the elements of each window, you'll notice that the resulting sets are not sized evenly because there are no more elements after the number `9`. You can use the `drop_remainder` flag to make sure that only 5-element windows are retained."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "QLEq6MG-2DN2"
},
"outputs": [],
"source": [
"# Generate a tf dataset with 10 elements (i.e. numbers 0 to 9)\n",
"dataset = tf.data.Dataset.range(10)\n",
"\n",
"# Window the data but only take those with the specified size\n",
"dataset = dataset.window(size=5, shift=1, drop_remainder=True)\n",
"\n",
"# Print the result\n",
"for window_dataset in dataset:\n",
" print([item.numpy() for item in window_dataset])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "B6DL74dqMu3T"
},
"source": [
"## Flatten the Windows\n",
"\n",
"In training the model later, you will want to prepare the windows to be [tensors](https://www.tensorflow.org/guide/tensor) instead of the `Dataset` structure. You can do that by feeding a mapping function to the [flat_map()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#flat_map) method. This function will be applied to each window and the results will be [flattened into a single dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#flatten_a_dataset_of_windows_2). To illustrate, the code below will put all elements of a window into a single batch then flatten the result."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PJ9CAHlJ2ODe"
},
"outputs": [],
"source": [
"# Generate a tf dataset with 10 elements (i.e. numbers 0 to 9)\n",
"dataset = tf.data.Dataset.range(10)\n",
"\n",
"# Window the data but only take those with the specified size\n",
"dataset = dataset.window(5, shift=1, drop_remainder=True)\n",
"\n",
"# Flatten the windows by putting its elements in a single batch\n",
"dataset = dataset.flat_map(lambda window: window.batch(5))\n",
"\n",
"# Print the results\n",
"for window in dataset:\n",
" print(window.numpy())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nxMA2L7IMx7V"
},
"source": [
"## Group into features and labels\n",
"\n",
"Next, you will want to mark the labels in each window. For this exercise, you will do that by splitting the last element of each window from the first four. This is done with the [map()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map) method containing a lambda function that defines the window slicing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DryEZ2Mz2nNV"
},
"outputs": [],
"source": [
"# Generate a tf dataset with 10 elements (i.e. numbers 0 to 9)\n",
"dataset = tf.data.Dataset.range(10)\n",
"\n",
"# Window the data but only take those with the specified size\n",
"dataset = dataset.window(5, shift=1, drop_remainder=True)\n",
"\n",
"# Flatten the windows by putting its elements in a single batch\n",
"dataset = dataset.flat_map(lambda window: window.batch(5))\n",
"\n",
"# Create tuples with features (first four elements of the window) and labels (last element)\n",
"dataset = dataset.map(lambda window: (window[:-1], window[-1]))\n",
"\n",
"# Print the results\n",
"for x,y in dataset:\n",
" print(\"x = \", x.numpy())\n",
" print(\"y = \", y.numpy())\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TnueY7A6NFdg"
},
"source": [
"## Shuffle the data\n",
"\n",
"It is good practice to shuffle your dataset to reduce *sequence bias* while training your model. This refers to the neural network overfitting to the order of inputs and consequently, it will not perform well when it does not see that particular order when testing. You don't want the sequence of training inputs to impact the network this way so it's good to shuffle them up. \n",
"\n",
"You can simply use the [shuffle()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle) method to do this. The `buffer_size` parameter is required for that and as mentioned in the doc, you should put a number equal or greater than the total number of elements for better shuffling. We can see from the previous cells that the total number of windows in the dataset is `6` so we can choose this number or higher."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1tl-0BOKkEtk"
},
"outputs": [],
"source": [
"# Generate a tf dataset with 10 elements (i.e. numbers 0 to 9)\n",
"dataset = tf.data.Dataset.range(10)\n",
"\n",
"# Window the data but only take those with the specified size\n",
"dataset = dataset.window(5, shift=1, drop_remainder=True)\n",
"\n",
"# Flatten the windows by putting its elements in a single batch\n",
"dataset = dataset.flat_map(lambda window: window.batch(5))\n",
"\n",
"# Create tuples with features (first four elements of the window) and labels (last element)\n",
"dataset = dataset.map(lambda window: (window[:-1], window[-1]))\n",
"\n",
"# Shuffle the windows\n",
"dataset = dataset.shuffle(buffer_size=10)\n",
"\n",
"# Print the results\n",
"for x,y in dataset:\n",
" print(\"x = \", x.numpy())\n",
" print(\"y = \", y.numpy())\n",
" print()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9Wr4jGaTNIk4"
},
"source": [
"## Create batches for training\n",
"\n",
"Lastly, you will want to group your windows into batches. You can do that with the [batch()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch) method as shown below. Simply specify the batch size and it will return a batched dataset with that number of windows. As a rule of thumb, it is also good to specify a [prefetch()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#prefetch) step. This optimizes the execution time when the model is already training. By specifying a prefetch `buffer_size` of `1` as shown below, Tensorflow will prepare the next one batch in advance (i.e. putting it in a buffer) while the current batch is being consumed by the model. You can read more about it [here](https://towardsdatascience.com/optimising-your-input-pipeline-performance-with-tf-data-part-1-32e52a30cac4#Prefetching)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Wa0PNwxMGapy"
},
"outputs": [],
"source": [
"# Generate a tf dataset with 10 elements (i.e. numbers 0 to 9)\n",
"dataset = tf.data.Dataset.range(10)\n",
"\n",
"# Window the data but only take those with the specified size\n",
"dataset = dataset.window(5, shift=1, drop_remainder=True)\n",
"\n",
"# Flatten the windows by putting its elements in a single batch\n",
"dataset = dataset.flat_map(lambda window: window.batch(5))\n",
"\n",
"# Create tuples with features (first four elements of the window) and labels (last element)\n",
"dataset = dataset.map(lambda window: (window[:-1], window[-1]))\n",
"\n",
"# Shuffle the windows\n",
"dataset = dataset.shuffle(buffer_size=10)\n",
"\n",
"# Create batches of windows\n",
"dataset = dataset.batch(2).prefetch(1)\n",
"\n",
"# Print the results\n",
"for x,y in dataset:\n",
" print(\"x = \", x.numpy())\n",
" print(\"y = \", y.numpy())\n",
" print()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7YiIH06unP1W"
},
"source": [
"## Wrap Up\n",
"\n",
"This short exercise showed you how to chain different methods of the `tf.data.Dataset` class to prepare a sequence into shuffled and batched window datasets. You will be using this same concept in the next exercises when you apply it to synthetic data and use the result to train a neural network. On to the next!"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "C4_W2_Lab_1_features_and_labels.ipynb",
"private_outputs": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Loading

0 comments on commit 64fee57

Please sign in to comment.