Dynamic SQL in Databricks and SQL Server
How to utilise dynamic code that can change at runtime to make maximum impact with the minimal amount of effort.
SQLDATABRICKS
12/5/20224 min read
What is dynamic SQL?
Dynamic SQL is a programming technique where you write a general purpose query and store it in a string variable, then alter key words in the string at runtime to alter the type of actions it will perform, the data it will return or the objects it will perform these actions on before it is actually executed.
Some form of dynamic SQL execution exists on all technologies that support the SQL language.
In this article, I will focus on Microsoft SQL Server (On-Premises or Azure) and Databricks.
Use Cases
In Azure Synapse Analytics (Serverless and Dedicated Pool) you can dynamically generate an external table pointing to a location in the data lake which is determined at runtime. This is ideal when you want the external table to only point to the latest file available in a lake, not all historical files.
Maintenance – you can perform the same maintenance task on a large set of objects when you make the maintenance code dynamic in terms of the object it interacts on. These actions can be to maintain indexes, drop tables not needed, clear the cache of databases etc.
Pivot and Unpivot – you can create a stored procedure to pivot or unpivot a table structure based on the column names sent in as parameters. This means you never have to google how to do the actual pivot or unpivot again – joy!
JSON Normalization – you can dynamically normalize a JSON object or array by dynamically getting all the attributes from the JSON string and generating the code to persist the normalized structure.
Optimized data retrieval – this is very useful for databases that get requests from multiple users at a time. You can create a dynamic query that will filter a column based on the value sent in from a front end application at runtime. This means a reduced set of data is returned, increasing performance and security.
These are but a few, I am not going to list all the use cases, I do have a life you know…
Benefits
The main benefit is you can increase the level of automation of your application by having code be generated dynamically, saving your developers the time of writing it all out manually. Especially if the majority of the structure of the query is the same with only minor differences here and there.
Considerations
Dynamic SQL can be hard to debug. I typically print out the SQL statement generated first before ever adding the EXEC statement at all so I can investigate the code generated and run it manually before letting SQL run it for me.
It is vulnerable to SQL Injection attacks, especially if you rely on input from users and not input from secure backend sources the users don’t interact with.
It can be slower to execute due to the execution plans the SQL engine uses to perform the actions the code instructs having to be generated anew each time – depending on the degree to which the query statement is altered as part of you dynamic-ness.
SQL Server Example
In this example, the query will filter the table based on a dynamic value passed in at runtime.
SQL Server Example
In this example, the query will filter the table based on a dynamic value passed in at runtime.
Databricks Example
In Databricks, there are situations where using SQL is the better choice over Python or Scala to perform certain actions, e.g. when working with Delta tables and merging new data into it.
Some pre-requisites for the below example:
You need to have a dataframe already defined earlier in the notebook and have it saved as a temporary view called sourceTemporaryView
The dataframe must have a field called ProductCategory to apply the partitioning on. This line can actually also be made dynamic to pass in a dynamic field name to partition on.
You must have a linked data lake where the data for this delta table will be stored. This location is defined by the vDeltaTablePath variable value. This should ideally be the path to a mounted data lake location.
Here is an example of python code where you can dynamically generate a new Delta table using SQL based on the data in a pre-defined dataframe.
Databricks Example
In Databricks, there are situations where using SQL is the better choice over Python or Scala to perform certain actions, e.g. when working with Delta tables and merging new data into it.
Some pre-requisites for the below example:
You need to have a dataframe already defined earlier in the notebook and have it saved as a temporary view called sourceTemporaryView
The dataframe must have a field called ProductCategory to apply the partitioning on. This line can actually also be made dynamic to pass in a dynamic field name to partition on.
You must have a linked data lake where the data for this delta table will be stored. This location is defined by the vDeltaTablePath variable value. This should ideally be the path to a mounted data lake location.
Here is an example of python code where you can dynamically generate a new Delta table using SQL based on the data in a pre-defined dataframe.
This article only gave a very high-level overview of the potential that dynamic SQL holds.
But remember, with great power, comes great responsibility.
Only use dynamic SQL when necessary as it comes with a cost – see the considerations above.
Ok, good luck, have fun, and don’t accidentally drop the production database using this information, I take no responsibility.
Social
All content on this website is either created by us or is used under aa free to use license.. We create the posts here to help the community as best as we can. It doesn't mean we are always correct, or the methods we show here are the best. We all change and learn as we grow, so if you see something you think we could have done better, please reach out! Let's share the knowledge and be kind to each other!
DIAN
GDG
FRANS