From 678dbd45f79d276b6a02f0fc0e65f66d052267fe Mon Sep 17 00:00:00 2001 From: Nupur Lal Date: Thu, 6 Feb 2025 11:15:06 +0000 Subject: [PATCH 1/3] new recipe notebook for kmeans and kmeanspredict function --- .../KMeans_KMeansPredict.ipynb | 419 ++++++++++++++++++ 1 file changed, 419 insertions(+) create mode 100644 Recipes/ClearScape_Functions/KMeans_KMeansPredict.ipynb diff --git a/Recipes/ClearScape_Functions/KMeans_KMeansPredict.ipynb b/Recipes/ClearScape_Functions/KMeans_KMeansPredict.ipynb new file mode 100644 index 00000000..185ca99d --- /dev/null +++ b/Recipes/ClearScape_Functions/KMeans_KMeansPredict.ipynb @@ -0,0 +1,419 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "bc549e6c-0cc4-4188-94a3-a9bdd3ae3dfa", + "metadata": {}, + "source": [ + "
\n", + "

\n", + " KMeans and KMeansPredict function in Vantage\n", + "
\n", + " \"Teradata\"\n", + "

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "7ae7611a-0795-4168-b716-01fee6880cbd", + "metadata": {}, + "source": [ + "

Introduction

\n", + "

The K-means() function groups a set of observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid). KMeansPredict uses the k-means algorithm to predict the target class of unseen or new data. In this notebook we will see how we can use the KMeans and KMeansPredict function available in Vantage.

" + ] + }, + { + "cell_type": "markdown", + "id": "6b3a00b4-6661-4c91-9b2d-cb7b0b403140", + "metadata": {}, + "source": [ + "
\n", + "1. Initiate a connection to Vantage" + ] + }, + { + "cell_type": "markdown", + "id": "2346857f-e0d3-488a-8a3f-ac6dff752c2b", + "metadata": {}, + "source": [ + "

In the section, we import the required libraries and set environment variables and environment paths (if required)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5af5af3-29d5-4f6a-8334-9df6924e7787", + "metadata": {}, + "outputs": [], + "source": [ + "from teradataml import *\n", + "\n", + "# Modify the following to match the specific client environment settings\n", + "display.max_rows = 5" + ] + }, + { + "cell_type": "markdown", + "id": "ad3dd7b4-831c-4fb3-ab71-719c8c99a71c", + "metadata": {}, + "source": [ + "


\n", + "

1.1 Connect to Vantage

\n", + "

You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2742444c-4349-4b0f-b4e5-b068a8785cd9", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/startup.ipynb\n", + "eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)\n", + "print(eng)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01c4a128-d106-46ea-8dee-34acc5abd29f", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "execute_sql('''SET query_band='DEMO=PP_KMeans_KMeansPredict_Python.ipynb;' UPDATE FOR SESSION; ''')" + ] + }, + { + "cell_type": "markdown", + "id": "efe2fd2d-63ff-4278-9157-8b9110d682e8", + "metadata": {}, + "source": [ + "

Begin running steps with Shift + Enter keys.

" + ] + }, + { + "cell_type": "markdown", + "id": "f003f332-7489-4bdd-a740-4af2a0a22280", + "metadata": {}, + "source": [ + "
\n", + "\n", + "

1.2 Getting Data for This Demo

\n", + "\n", + "

Here, we will get the data which is available in the teradataml library and use the same to show the usage of the function.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45c86176-734c-4b1c-ace0-d0c88657b4f8", + "metadata": {}, + "outputs": [], + "source": [ + "load_example_data(\"kmeans\", \"computers_train1\")" + ] + }, + { + "cell_type": "markdown", + "id": "2401d6d3-4fcd-46fc-8a94-7cafcd1258b0", + "metadata": {}, + "source": [ + "

Next is an optional step – if you want to see the status of databases/tables created and space used.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "87429200-db02-450d-9472-4d1e2030124d", + "metadata": {}, + "outputs": [], + "source": [ + "%run -i ../../UseCases/run_procedure.py \"call space_report();\" # Takes 10 seconds" + ] + }, + { + "cell_type": "markdown", + "id": "2a3762ac-ba27-4fa3-adba-d577262a4290", + "metadata": {}, + "source": [ + "
\n", + "2. Data Exploration\n", + "

Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3d936fab-7ca7-4e94-ba64-95c1da08b74f", + "metadata": {}, + "outputs": [], + "source": [ + "# Create teradataml DataFrame objects.\n", + "computers_train = DataFrame.from_table(\"computers_train1\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3c5a0992-f651-49bc-9080-828fd9c0c982", + "metadata": {}, + "outputs": [], + "source": [ + "computers_train" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96070c57-9a6b-451d-aa05-4ea3825a4bff", + "metadata": {}, + "outputs": [], + "source": [ + "computers_train.shape" + ] + }, + { + "cell_type": "markdown", + "id": "0a70cf26-0b67-4acd-a4a1-28881016ca39", + "metadata": {}, + "source": [ + "
\n", + "

2.1 KMeans Function

" + ] + }, + { + "cell_type": "markdown", + "id": "ffef7ff5-0fad-4e47-a415-a1d58e0b37fd", + "metadata": {}, + "source": [ + "

We want to divide our data into two clusters, we will use KMeans function for this.
\n", + "Detailed help can be found by passing function name to built-in help function.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "703f9e25-6785-4f91-a22b-3bc491b046a2", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "help(KMeans)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7de768e5-826f-499e-95f3-4d231d4c1c7b", + "metadata": {}, + "outputs": [], + "source": [ + "KMeans_out = KMeans(id_column=\"id\",\n", + " target_columns=['price', 'speed'],\n", + " data=computers_train,\n", + " num_clusters=2)\n", + "# Print the result DataFrame\n", + "KMeans_out.result" + ] + }, + { + "cell_type": "markdown", + "id": "8bddd414-903c-4c64-8f67-2323d332908b", + "metadata": {}, + "source": [ + "

We can also specify initial centroid information instead of number of clusters.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f042acff-500e-46f4-9f5c-437476aa3cc7", + "metadata": {}, + "outputs": [], + "source": [ + "kmeans_initial_centroids_table = computers_train.loc[[19, 97]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e07260ab-61a3-45a0-9536-c4df4971d7ed", + "metadata": {}, + "outputs": [], + "source": [ + "kmeans_initial_centroids_table" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8675e314-4b55-4265-8d78-9ac8e3c9661e", + "metadata": {}, + "outputs": [], + "source": [ + "KMeans_out_1 = KMeans(id_column=\"id\",\n", + " target_columns=['price', 'speed'],\n", + " data=computers_train,\n", + " centroids_data=kmeans_initial_centroids_table)\n", + " \n", + " # Print the result DataFrames.\n", + "KMeans_out_1.result" + ] + }, + { + "cell_type": "markdown", + "id": "49fbbae3-aecb-4237-85ed-9586cbdc1a4f", + "metadata": {}, + "source": [ + "
\n", + "

2.2 KMeansPredict Function

" + ] + }, + { + "cell_type": "markdown", + "id": "ab6a9a63-7f1e-4964-b3f5-bfedd7a4c28d", + "metadata": {}, + "source": [ + "

We can assign the input data points to the cluster centroid using the model generated by the KMeans() function in KMeansPredict() function.
Detailed help can be found by passing function name to built-in help function.

\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5c092ed4-c489-43a7-bce0-ffaf2a974539", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "help(KMeansPredict)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fedfa443-e84e-4eb0-8517-cfba00371401", + "metadata": {}, + "outputs": [], + "source": [ + "KMeansPredict_out = KMeansPredict(data=computers_train,\n", + " object=KMeans_out.result\n", + " )\n", + " \n", + "# Print the result DataFrames.\n", + "KMeansPredict_out.result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a0adb0e-a721-43e3-842d-19dfa47d3ab7", + "metadata": {}, + "outputs": [], + "source": [ + "KMeansPredict_out_1 = KMeansPredict(data=computers_train,\n", + " object=KMeans_out_1,\n", + " accumulate=[\"ram\",\"price\",\"speed\"],\n", + " output_distance=True\n", + " )\n", + " \n", + "# Print the result DataFrames.\n", + "KMeansPredict_out_1.result" + ] + }, + { + "cell_type": "markdown", + "id": "151d5db4-29a9-49d9-8a61-d53f9627a294", + "metadata": {}, + "source": [ + "
\n", + "3. Cleanup" + ] + }, + { + "cell_type": "markdown", + "id": "a562f058-fb24-4966-a25d-f2960e6ddfb8", + "metadata": {}, + "source": [ + "
\n", + "

Databases and Tables

\n", + "

The following code will clean up tables and databases created above.

" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6b3935b-47c2-4a96-bec2-68106d172116", + "metadata": {}, + "outputs": [], + "source": [ + "db_drop_table(\"computers_train1\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "157fe3d4-4e0e-4d92-b343-9f758f3bf690", + "metadata": {}, + "outputs": [], + "source": [ + "remove_context()" + ] + }, + { + "cell_type": "markdown", + "id": "4317a6cf-1479-4aa8-b30a-ee0a3b5231a8", + "metadata": {}, + "source": [ + "
\n", + "\n", + "

Links:

\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "b2dcca28-5de5-44d7-88cb-45a12153b3f8", + "metadata": {}, + "source": [ + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From c7eb8e706a5d28d614840fae55ce7b392fc119ca Mon Sep 17 00:00:00 2001 From: nupur-lal Date: Tue, 25 Feb 2025 06:24:18 +0000 Subject: [PATCH 2/3] dark theme changes --- .../KMeans_KMeansPredict.ipynb | 62 +++++++++---------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/Recipes/ClearScape_Functions/KMeans_KMeansPredict.ipynb b/Recipes/ClearScape_Functions/KMeans_KMeansPredict.ipynb index 185ca99d..d74861ce 100644 --- a/Recipes/ClearScape_Functions/KMeans_KMeansPredict.ipynb +++ b/Recipes/ClearScape_Functions/KMeans_KMeansPredict.ipynb @@ -19,8 +19,8 @@ "id": "7ae7611a-0795-4168-b716-01fee6880cbd", "metadata": {}, "source": [ - "

Introduction

\n", - "

The K-means() function groups a set of observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid). KMeansPredict uses the k-means algorithm to predict the target class of unseen or new data. In this notebook we will see how we can use the KMeans and KMeansPredict function available in Vantage.

" + "

Introduction

\n", + "

The K-means() function groups a set of observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid). KMeansPredict uses the k-means algorithm to predict the target class of unseen or new data. In this notebook we will see how we can use the KMeans and KMeansPredict function available in Vantage.

" ] }, { @@ -28,8 +28,8 @@ "id": "6b3a00b4-6661-4c91-9b2d-cb7b0b403140", "metadata": {}, "source": [ - "
\n", - "1. Initiate a connection to Vantage" + "
\n", + "1. Initiate a connection to Vantage" ] }, { @@ -37,7 +37,7 @@ "id": "2346857f-e0d3-488a-8a3f-ac6dff752c2b", "metadata": {}, "source": [ - "

In the section, we import the required libraries and set environment variables and environment paths (if required)." + "

In the section, we import the required libraries and set environment variables and environment paths (if required)." ] }, { @@ -58,9 +58,9 @@ "id": "ad3dd7b4-831c-4fb3-ab71-719c8c99a71c", "metadata": {}, "source": [ - "


\n", - "

1.1 Connect to Vantage

\n", - "

You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.

" + "
\n", + "

1.1 Connect to Vantage

\n", + "

You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.

" ] }, { @@ -91,7 +91,7 @@ "id": "efe2fd2d-63ff-4278-9157-8b9110d682e8", "metadata": {}, "source": [ - "

Begin running steps with Shift + Enter keys.

" + "

Begin running steps with Shift + Enter keys.

" ] }, { @@ -99,11 +99,11 @@ "id": "f003f332-7489-4bdd-a740-4af2a0a22280", "metadata": {}, "source": [ - "
\n", + "
\n", "\n", - "

1.2 Getting Data for This Demo

\n", + "

1.2 Getting Data for This Demo

\n", "\n", - "

Here, we will get the data which is available in the teradataml library and use the same to show the usage of the function.

" + "

Here, we will get the data which is available in the teradataml library and use the same to show the usage of the function.

" ] }, { @@ -121,7 +121,7 @@ "id": "2401d6d3-4fcd-46fc-8a94-7cafcd1258b0", "metadata": {}, "source": [ - "

Next is an optional step – if you want to see the status of databases/tables created and space used.

" + "

Next is an optional step – if you want to see the status of databases/tables created and space used.

" ] }, { @@ -139,9 +139,9 @@ "id": "2a3762ac-ba27-4fa3-adba-d577262a4290", "metadata": {}, "source": [ - "
\n", - "2. Data Exploration\n", - "

Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.

" + "
\n", + "2. Data Exploration\n", + "

Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.

" ] }, { @@ -180,8 +180,8 @@ "id": "0a70cf26-0b67-4acd-a4a1-28881016ca39", "metadata": {}, "source": [ - "
\n", - "

2.1 KMeans Function

" + "
\n", + "

2.1 KMeans Function

" ] }, { @@ -189,7 +189,7 @@ "id": "ffef7ff5-0fad-4e47-a415-a1d58e0b37fd", "metadata": {}, "source": [ - "

We want to divide our data into two clusters, we will use KMeans function for this.
\n", + "

We want to divide our data into two clusters, we will use KMeans function for this.
\n", "Detailed help can be found by passing function name to built-in help function.

" ] }, @@ -225,7 +225,7 @@ "id": "8bddd414-903c-4c64-8f67-2323d332908b", "metadata": {}, "source": [ - "

We can also specify initial centroid information instead of number of clusters.

" + "

We can also specify initial centroid information instead of number of clusters.

" ] }, { @@ -269,8 +269,8 @@ "id": "49fbbae3-aecb-4237-85ed-9586cbdc1a4f", "metadata": {}, "source": [ - "
\n", - "

2.2 KMeansPredict Function

" + "
\n", + "

2.2 KMeansPredict Function

" ] }, { @@ -278,7 +278,7 @@ "id": "ab6a9a63-7f1e-4964-b3f5-bfedd7a4c28d", "metadata": {}, "source": [ - "

We can assign the input data points to the cluster centroid using the model generated by the KMeans() function in KMeansPredict() function.
Detailed help can be found by passing function name to built-in help function.

\n" + "

We can assign the input data points to the cluster centroid using the model generated by the KMeans() function in KMeansPredict() function.
Detailed help can be found by passing function name to built-in help function.

\n" ] }, { @@ -330,8 +330,8 @@ "id": "151d5db4-29a9-49d9-8a61-d53f9627a294", "metadata": {}, "source": [ - "
\n", - "3. Cleanup" + "
\n", + "3. Cleanup" ] }, { @@ -339,9 +339,9 @@ "id": "a562f058-fb24-4966-a25d-f2960e6ddfb8", "metadata": {}, "source": [ - "
\n", - "

Databases and Tables

\n", - "

The following code will clean up tables and databases created above.

" + "
\n", + "

Databases and Tables

\n", + "

The following code will clean up tables and databases created above.

" ] }, { @@ -369,9 +369,9 @@ "id": "4317a6cf-1479-4aa8-b30a-ee0a3b5231a8", "metadata": {}, "source": [ - "
\n", + "
\n", "\n", - "

Links:

\n", + "

Links:

\n", "