You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/spark-on-azure/_index.md
+9-5Lines changed: 9 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,26 @@
1
1
---
2
2
title: Run Spark applications on the Microsoft Azure Cobalt 100 processors
3
3
4
+
draft: true
5
+
cascade:
6
+
draft: true
7
+
4
8
minutes_to_complete: 60
5
9
6
-
who_is_this_for: This Learning Path introduces Spark deployment on Microsoft Azure Cobalt 100 (Arm-based) virtual machines. It is designed for developers migrating Spark applications from x86_64 to Arm with minimal or no changes.
10
+
who_is_this_for: This is an advanced topic that introduces Spark deployment on Microsoft Azure Cobalt 100 (Arm-based) virtual machines. It is designed for developers migrating Spark applications from x86_64 to Arm.
7
11
8
12
learning_objectives:
9
-
- Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu as the base image.
13
+
- Provision an Azure Arm64 virtual machine using Azure console.
10
14
- Learn how to create an Azure Linux 3.0 Docker container.
11
-
- Deploy a Spark application inside an Azure Linux 3.0 Arm64-based Docker container and an Azure Linux 3.0 custom-image based Azure virtual machine.
12
-
- Perform Spark benchmarking inside the container as well as the custom virtual machine.
15
+
- Deploy a Spark application inside an Azure Linux 3.0 Arm64-based Docker container or an Azure Linux 3.0 custom-image based Azure virtual machine.
16
+
- Run a suite of Spark benchmarks to understand and evaluate performance on the Azure Cobalt 100 virtual machine.
13
17
14
18
prerequisites:
15
19
- A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6).
16
20
- A machine with [Docker](/install-guides/docker/) installed.
17
21
- Familiarity with distributed computing concepts and the [Apache Spark architecture](https://spark.apache.org/docs/latest/).
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/spark-on-azure/background.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
1
---
2
-
title: "About Cobalt 100 Arm-based processor and Apache Spark"
2
+
title: "Overview"
3
3
4
4
weight: 2
5
5
6
6
layout: "learningpathall"
7
7
---
8
8
9
-
## What is Cobalt 100 Arm-based processor?
9
+
## What is the Azure Cobalt 100 processor?
10
10
11
-
Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s NeoverseN2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.
11
+
Azure’s Cobalt 100 is built on Microsoft's first-generationArm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse-N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.
12
12
13
13
To learn more about Cobalt 100, refer to the blog [Announcing the preview of new Azure virtual machine based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/spark-on-azure/baseline.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,16 +1,16 @@
1
1
---
2
-
title: Baseline Testing
2
+
title: Functional Validation
3
3
weight: 6
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
8
9
9
10
-
## Baseline Testing
10
+
## Functional Validation
11
11
Since Apache Spark is installed successfully on your Arm virtual machine, let's now perform simple baseline testing to validate that Spark runs correctly and gives expected output.
12
12
13
-
Run a simple PySpark script, create a file named `test_spark.py`, and add the below content to it:
13
+
Using a file editor of your choice, create a file named `test_spark.py`, and add the below content to it:
This executes the **JoinBenchmark**, which measures the performance of various SQL join operations (e.g., SortMergeJoin, BroadcastHashJoin) under different query plans. It helps evaluate how Spark SQL optimizes and executes join strategies, especially with and without WholeStageCodegen, a technique that compiles entire query stages into efficient bytecode for faster execution.
35
+
This executes the `JoinBenchmark`, which measures the performance of various SQL join operations (e.g., SortMergeJoin, BroadcastHashJoin) under different query plans. It helps evaluate how Spark SQL optimizes and executes join strategies, especially with and without WholeStageCodegen, a technique that compiles entire query stages into efficient bytecode for faster execution.
36
36
37
-
You should see an output similar to:
37
+
The output should look similar to:
38
38
```output
39
39
[info] Running benchmark: Join w long
40
40
[info] Running case: Join w long wholestage off
@@ -183,36 +183,9 @@ You should see an output similar to:
183
183
Benchmarking was performed in both an Azure Linux 3.0 Docker container and an Azure Linux 3.0 virtual machine. The benchmark results were found to be comparable.
184
184
{{% /notice %}}
185
185
186
-
Accordingly, this Learning path includes benchmark results from virtual machines only, for both x86 and Arm64 platforms.
187
-
### Benchmark summary on x86_64:
188
-
The following benchmark results are collected on an x86_64 **D4s_v4 Azure virtual machine using the Azure Linux 3.0 image published by Ntegral Inc**.
189
-
| Benchmark | Wholestage | Best Time (ms) | Avg Time (ms) | Stdev (ms) | Rate (M/s) | Per Row (ns) | Relative |
The following benchmark results were collected on an Arm64 **D4ps_v6 Azure virtual machine created from a custom Azure Linux 3.0 image using the AArch64 ISO**.
188
+
For easier comparison, shown here is a summary of benchmark results collected on an Arm64 `D4ps_v6` Azure virtual machine created from a custom Azure Linux 3.0 image using the AArch64 ISO.
216
189
| Benchmark | Wholestage | Best Time (ms) | Avg Time (ms) | Stdev (ms) | Rate (M/s) | Per Row (ns) | Relative |
### **Highlights from Azure Linux Arm64 virtual machine**
214
+
### Benchmark summary on x86_64:
215
+
Shown here is a summary of the benchmark results collected on an `x86_64``D4s_v4` Azure virtual machine using the Azure Linux 3.0 image published by Ntegral Inc.
216
+
| Benchmark | Wholestage | Best Time (ms) | Avg Time (ms) | Stdev (ms) | Rate (M/s) | Per Row (ns) | Relative |
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/spark-on-azure/container-setup.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,28 +7,28 @@ layout: learningpathall
7
7
---
8
8
9
9
10
-
You have an option to choose between working with the Azure Linux 3.0 Docker image or inside the virtual machine created with the OS image.
10
+
You can choose between deploying your Spark workload either in an Azure Linux 3.0 Docker container or on a virtual machine created from a custom Azure Linux 3.0 image.
11
11
12
12
### Working inside Azure Linux 3.0 Docker container
13
-
The Azure Linux Container Host is an operating system image that's optimized for running container workloads on Azure Kubernetes Service (AKS). Microsoft maintains the Azure Linux Container Host and based it on CBL-Mariner, an open-source Linux distribution created by Microsoft. To know more about Azure Linux 3.0, kindly refer [What is Azure Linux Container Host for AKS](https://learn.microsoft.com/en-us/azure/azure-linux/intro-azure-linux).
13
+
The Azure Linux Container Host is an operating system image that's optimized for running container workloads on Azure Kubernetes Service (AKS). Microsoft maintains the Azure Linux Container Host and based it on CBL-Mariner, an open-source Linux distribution created by Microsoft. To know more about Azure Linux 3.0, refer to[What is Azure Linux Container Host for AKS](https://learn.microsoft.com/en-us/azure/azure-linux/intro-azure-linux).
14
14
15
-
Azure Linux 3.0 offers support for AArch64. However, the standalone virtual machine image for Azure Linux 3.0 or CBL Mariner 3.0 is not available for Arm. Hence, to use the default software stack provided by the Microsoft team, you can create a docker container with Azure Linux 3.0 as a base image, and run the Spark application inside the container.
15
+
Azure Linux 3.0 offers support for AArch64. However, the standalone virtual machine image for Azure Linux 3.0 or CBL Mariner 3.0 is not available for Arm. To use the default software stack provided by the Microsoft, you can run a docker container with Azure Linux 3.0 as a base image, and run the Spark application inside the container.
16
16
17
-
#### Create Azure Linux 3.0 Docker Container
17
+
#### Option 1: Run an Azure Linux 3.0 Docker Container
18
18
The [Microsoft Artifact Registry](https://mcr.microsoft.com/en-us/artifact/mar/azurelinux/base/core/about) offers updated docker image for the Azure Linux 3.0.
19
19
20
-
To create a docker container, install docker, and then follow the below instructions:
20
+
To run a docker container with Azure Linux 3.0, install [docker](/install-guides/docker/docker-engine/), and then run the command:
21
21
22
22
```console
23
23
sudo docker run -it --rm mcr.microsoft.com/azurelinux/base/core:3.0
24
24
```
25
-
The default container startup command is bash. tdnf and dnf are the default package managers.
25
+
The default container starts up with a bash shell. `tdnf` and `dnf` are the default package managers available to use on the container.
26
26
27
-
### Working with Azure Linux 3.0 OS image
28
-
As of now, the Azure Marketplace offers official virtual machine images of Azure Linux 3.0 only for x64-based architectures, published by Ntegral Inc. However, native Arm64 (AArch64) images are not yet officially available. Hence, for this Learning Path, you can create your own custom Azure Linux 3.0 virtual machine image for AArch64 using the [AArch64 ISO for Azure Linux 3.0](https://github.com/microsoft/azurelinux#iso).
27
+
### Option 2: Create a virtual machine instance with Azure Linux 3.0 OS image
28
+
As of now, the Azure Marketplace offers official virtual machine images of Azure Linux 3.0 only for `x86_64`based architectures, published by Ntegral Inc. While native Arm64 (AArch64) images are not yet officially available, you can create your own custom Azure Linux 3.0 virtual machine image for AArch64 using the [AArch64 ISO for Azure Linux 3.0](https://github.com/microsoft/azurelinux#iso).
29
29
30
-
Refer [Create an Azure Linux 3.0 virtual machine with Cobalt 100 processors](https://learn.arm.com/learning-paths/servers-and-cloud-computing/azure-vm) for the details.
30
+
Refer to [Create an Azure Linux 3.0 virtual machine with Cobalt 100 processors](/learning-paths/servers-and-cloud-computing/azure-vm) for the detailed steps.
31
31
32
-
Whether you're using an Azure Linux 3.0 Docker container, or a virtual machine created from a custom Azure Linux 3.0 image, the deployment and benchmarking steps remain the same.
32
+
Whether you choose to use an Azure Linux 3.0 Docker container, or a virtual machine created from a custom Azure Linux 3.0 image, the Spark deployment and benchmarking steps in the following sections will remain the same.
33
33
34
-
Once the setup has been established, you can proceed with the Spark Installation ahead.
34
+
Once the setup is complete, you can proceed with installing and running Spark in the next section.
0 commit comments