Dynamic SQL in Databricks and SQL Server

How to utilise dynamic code that can change at runtime to make maximum impact with the minimal amount of effort.

SQLDATABRICKS

12/5/20224 min read

What is dynamic SQL?

Dynamic SQL is a programming technique where you write a general purpose query and store it in a string variable, then alter key words in the string at runtime to alter the type of actions it will perform, the data it will return or the objects it will perform these actions on before it is actually executed.

Some form of dynamic SQL execution exists on all technologies that support the SQL language.

In this article, I will focus on Microsoft SQL Server (On-Premises or Azure) and Databricks.

Use Cases

  1. In Azure Synapse Analytics (Serverless and Dedicated Pool) you can dynamically generate an external table pointing to a location in the data lake which is determined at runtime. This is ideal when you want the external table to only point to the latest file available in a lake, not all historical files.

  1. Maintenance – you can perform the same maintenance task on a large set of objects when you make the maintenance code dynamic in terms of the object it interacts on. These actions can be to maintain indexes, drop tables not needed, clear the cache of databases etc. 

  1. Pivot and Unpivot – you can create a stored procedure to pivot or unpivot a table structure based on the column names sent in as parameters. This means you never have to google how to do the actual pivot or unpivot again – joy!

  1. JSON Normalization – you can dynamically normalize a JSON object or array by dynamically getting all the attributes from the JSON string and generating the code to persist the normalized structure.

  1. Optimized data retrieval – this is very useful for databases that get requests from multiple users at a time. You can create a dynamic query that will filter a column based on the value sent in from a front end application at runtime. This means a reduced set of data is returned, increasing performance and security.

These are but a few, I am not going to list all the use cases, I do have a life you know…

Benefits

The main benefit is you can increase the level of automation of your application by having code be generated dynamically, saving your developers the time of writing it all out manually. Especially if the majority of the structure of the query is the same with only minor differences here and there.

Considerations

  1. Dynamic SQL can be hard to debug. I typically print out the SQL statement generated first before ever adding the EXEC statement at all so I can investigate the code generated and run it manually before letting SQL run it for me.

  1. It is vulnerable to SQL Injection attacks, especially if you rely on input from users and not input from secure backend sources the users don’t interact with.

  1. It can be slower to execute due to the execution plans the SQL engine uses to perform the actions the code instructs having to be generated anew each time – depending on the degree to which the query statement is altered as part of you dynamic-ness.

SQL Server Example

In this example, the query will filter the table based on a dynamic value passed in at runtime.

SQL Server Example

In this example, the query will filter the table based on a dynamic value passed in at runtime.

Databricks Example

In Databricks, there are situations where using SQL is the better choice over Python or Scala to perform certain actions, e.g. when working with Delta tables and merging new data into it.

Some pre-requisites for the below example:

  1. You need to have a dataframe already defined earlier in the notebook and have it saved as a temporary view called sourceTemporaryView

  1. The dataframe must have a field called ProductCategory to apply the partitioning on. This line can actually also be made dynamic to pass in a dynamic field name to partition on.

  1. You must have a linked data lake where the data for this delta table will be stored. This location is defined by the vDeltaTablePath variable value. This should ideally be the path to a mounted data lake location.

Here is an example of python code where you can dynamically generate a new Delta table using SQL based on the data in a pre-defined dataframe.

Databricks Example

In Databricks, there are situations where using SQL is the better choice over Python or Scala to perform certain actions, e.g. when working with Delta tables and merging new data into it.

Some pre-requisites for the below example:

  1. You need to have a dataframe already defined earlier in the notebook and have it saved as a temporary view called sourceTemporaryView

  1. The dataframe must have a field called ProductCategory to apply the partitioning on. This line can actually also be made dynamic to pass in a dynamic field name to partition on.

  1. You must have a linked data lake where the data for this delta table will be stored. This location is defined by the vDeltaTablePath variable value. This should ideally be the path to a mounted data lake location.

Here is an example of python code where you can dynamically generate a new Delta table using SQL based on the data in a pre-defined dataframe.

This article only gave a very high-level overview of the potential that dynamic SQL holds.

But remember, with great power, comes great responsibility.

Only use dynamic SQL when necessary as it comes with a cost – see the considerations above.

Ok, good luck, have fun, and don’t accidentally drop the production database using this information, I take no responsibility.