Skip to content

Template to deploy Synapse Analytics using best practices to deliver a proof of concept.

Notifications You must be signed in to change notification settings

tonio-lora/Azure-Synapse-Analytics-PoC

 
 

Repository files navigation

Azure-Synapse-Analytics-PoC

alt tag

Description

Create a Synapse Analytics environment based on best practices to achieve a successful proof of concept. While settings can be adjusted, the major deployment differences are based on whether or not you used Private Endpoints for connectivity. If you do not already use Private Endpoints for other Azure deployments, it's discouraged to use them for a proof of concept as they have many other networking depandancies than what can be configured here.

How to Run

These commands should be executed from the Azure Cloud Shell at https://shell.azure.com using PowerShell:

rm -rf Azure-Synapse-Analytics-PoC
git clone https://github.com/tonio-lora/Azure-Synapse-Analytics-PoC  
cd Azure-Synapse-Analytics-PoC  
bash setup.sh 
bash configure.sh 
./upload_sql_scripts.ps1
 
 
  • There are a few variables in terraform.tfvars which could be optionally updated to reflect your environment (e.g. synapse_azure_ad_admin_upn) before you run the setup.sh script.
  • setup.sh is the bash script that uses Terraform to deploy the environment. configure.sh performs post deployment configuration that cannot be done with Terraform.

What's Deployed

Azure Synapse Analytics Workspace

  • DW1000 Dedicated SQL Pool
  • Sample SQL Scripts and Spark Notebooks
  • Metadata driven Data Loader pipeline to quickly onboard parquet files available in the Data Lake

Azure Data Lake Storage Gen2

  • config container for Azure Synapse Analytics Workspace
  • data container for queried/ingested data including AdventureWorksDW2019 in parquet format

Azure Log Analytics

  • Logging and telemetry for Azure Synapse Analytics
  • Logging and telemetry for Azure Data Lake Storage Gen2

What's Configured

  • Enable Result Set Caching
  • Create a pipeline to auto pause/resume the Dedicated SQL Pool
  • Feature flag to enable/disable Private Endpoints
  • Serverless SQL Demo Data Database
  • Proper service and user permissions for Azure Synapse Analytics Workspace and Azure Data Lake Storage Gen2
  • Parquet Auto Ingestion pipeline to optimize data ingestion using best practices

Optional Steps

  • Load the sample parquet files into the Dedicated SQL pool. If you have addiotnal files, just add them to the Parquet_Auto_Ingestion_Metadata.csv stored in the data container
  • Download the sample Power BI file from the Azure Cloud Shell and change the connection to use your new Synapse. This sample file includes a report that uses the tables loaded in the previous step

About

Template to deploy Synapse Analytics using best practices to deliver a proof of concept.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 37.8%
  • HCL 26.0%
  • TSQL 21.0%
  • Shell 12.1%
  • PowerShell 3.1%