Dynamically partitioning tables in SSAS Tabular model using SQL

Dynamically partitioning tables in SSAS Tabular model using SQL

In this post, I describe how we can dynamically partitioning tables in SSAS tabular using SQL.
This approach works well for most of the simple scenarios where we need to keep control over the number of partitions to refresh on a daily basis or any frequencies.

What is partition

Partitions split a table into logical partition objects. Each partition contains a portion of the data, partitions can be processed in parallel independent of other partitions or excluded from processing operations if they don’t need to be refreshed.

Beefits of partitioning

  • Reduce the volume of data across different environments
  • Parallel processing
  • Incremental loading
  • Set different data refresh frequency, historical data don’t need to be refreshed every day

Parallel processing and incremental loading are common scenarios for using the partitions, however, another scenario where partitions are also very useful is for reducing the volume of data across each environment.
In this post, I’m going to show how we can simply dynamically partitioning tables in SSAS using SQL.

Why should we limit the volume of data?

Putting things in another words limiting the volume of data means:

  • Reducing the size of our model
  • Reducing the time processing of our model
  • Reducing the CPU and memory consumption
  • Reducing the cost of our On-prem or cloud infrastructure

So disregarding whether our data are on-prem or in the cloud, the storage and the computation needed to run our BI system comes with a cost.
As it is common to keep 5+ years of historic data in a BI production system this is certainly not a good idea to keep historical data on a Development or a Test server.

Allocating the same computational power and storage capacity for all our servers (Dev, Test, UAT, Prod) is a waste of money and energy. So before jumping into the how-to-do part let me answer the why SQL question.

Why using SQL over other techniques?

There are many techniques to dynamically partitioning tables in SSAS Tabular data models such as using TMSL scripting, SSIS, Powershell, .NET, and they all come with their pros and cons.
In my opinion, the most robust technique would be to use SSIS or Powershell automation combined with a release automation tool such as DevOps, however, this would require a lot of effort to implement and maintain.
The technique I’m going to describe has the advantage to be fairly simple, very fast to implement and easy to maintain.

How can we keep control over the volume of data across our different servers?

Create a Param table across your various environments
This parameter table is the main component to configure the number of partitions across each environment.
We need to have this table in each environment, this table contains different values in each environment and of course, we should have the “NBPartitions” value lower in the Dev/Test environments than in the Prod environment.

Generate dynamic filters for each partition

After the param table created and set up I decided to use a view to keep things simple and to avoid any dependencies.
The view will dynamically generate the datekey to be used in each partition.

CREATE OR ALTER VIEW [ssas].[v_CubePartitionsList]
AS 
    SELECT		p.DBName
	,			p.TableName
	,			RANK() OVER (PARTITION BY p.dbname, p.tablename ORDER BY left(datekey,6)  DESC) AS partitionNumber
	,			dd.YEAR
    ,			left(MIN(datekey),6) AS FromMonth 
	,			left(MAX(datekey),6) AS ToMonth
	,			MIN(dd.datekey) AS FromDate
	,			MAX(dd.datekey) AS Todate
    FROM		param.ssas.CubePartitionsParam p 
	INNER JOIN	Param.dim.Date dd 
	ON			DateKey BETWEEN  CONVERT(INT, CONVERT(VARCHAR(12), DATEADD(MONTH, -p.NbPartitions , GETDATE()), 112)) AND CONVERT(INT, CONVERT(VARCHAR(12), GETDATE(), 112))
	WHERE		p.PartitionBy = 'Month'
    GROUP BY	p.dbname
	,			p.tablename
	,			dd.Year
	,			left(datekey,6)
	,			dd.firstdayofmonth
	,			dd.lastdayofmonth
	UNION ALL 
	SELECT		p.DBName
	,			p.TableName
	,			RANK() OVER (PARTITION BY p.dbname, p.tablename ORDER BY dd.Year DESC) AS partitionNumber
	,			dd.Year
    ,			left(MIN(datekey),6) AS FromMonth 
	,			left(MAX(datekey),6) AS ToMonth
	,			MIN(dd.datekey) AS FromDate
	,			MAX(dd.datekey) AS Todate
    FROM		param.ssas.CubePartitionsParam p
	INNER JOIN	Param.dim.Date dd 
	ON			Year BETWEEN  YEAR(GETDATE())-p.nbpartitions +1 AND YEAR(GETDATE())
	WHERE		p.PartitionBy = 'Year'
    GROUP BY	p.dbname
	,			p.tablename
	,			dd.YEAR

The script I used to create the dim table used in this view can be found on github here

Here is a subset of the output generated by the view for the FactSalesQuota and the FactInternetSales tables.

So far we’ve seen how to configure and generate a dynamic view that contains the list of date to be used for each partition.
Now let’s see how we can dynamically partitioning tables in SSAS using this view.

Create the maximum number of partitions you need

Here the trick is to create beforehand the maximum number of partitions that you will need in your live environment.

So let’s assume that you need 6 partitions for your production model (current year + 5 years of history; 1 year = 1 partition), but you only need 1 year in Dev and 2 years in Test.
Your model will still have 6 partitions across each environment, however, some partitions will be empty (0 rows) in Dev and Test.
By doing so, we can manage everything with the param table of our different server and we don’t need to use any .Net, TMSL or PowerShell code.

Then we need to use “Select *” in each partition to make the maintenance easier so if we need to add a new column in our fact table we do it in the view pointing to our fact only one time and all the partitions will reflect the changes made in the view.
And of course, as best practice, we should only include in the view the columns that we really in your model!

WITH Partitions_CTE AS (
			SELECT		fromdate
			,			todate 
			FROM		[PARAM].[ssas].[v_CubePartitionsList]
			WHERE		DBName = 'AdventureWorksDW2017' 
			AND			TableName = 'FactInternetSales'
			AND			partitionNumber = 9
			)
SELECT		fct.*
FROM		AdventureWorksDW2017.dbo.FactInternetSales fct
INNER JOIN  Partitions_CTE c 
ON			fct.OrderDateKey BETWEEN fromdate AND todate

The two values fromdate and todate are dynamically generated by the view and will correspond to the first day and last day of September 2020.

This is what the above query will return:

Real case scenario

In this scenario here are historical data that we need for each environment:

  • 6 months of data in Dev
  • 1 year of data in Test
  • 2 years of data in UAT
  • 5 years of data in Prod

As we can see below the only thing needed to dynamically partitioning tables in SSAS is to change the values of the “nbpartitions” of the param table and everything else is fully dynamic.

Dynamically partitioning tables in SSAS

Create all the partitions beforehand

As already mentioned we need to create the maximum number of partitions in our model disregarding in which environment our model will be deployed.
In this scenario we assume that we need 13 partitions divided by months, in production all the partitions are refreshed.

Now let’s see how we will process only 3 month of data in our Test environment:

TMSL vs SQL

We could perfectly generate a TMSL command via SSMS and then run the command on-demand or schedule it to the desired frequency.
However, whenever we want to change the number of partitions to refresh we also need to amend the JSON script to add or remove a partition which can become hard to maintain.

Here is the TMSL command that we can use to process only three partitions:

{
  "refresh": {
    "type": "automatic",
    "objects": [
      {
        "database": "AdventureWorks2",
        "table": "Internet Sales",
        "partition": "Internet Sales - Current Month"
      },
      {
        "database": "AdventureWorks2",
        "table": "Internet Sales",
        "partition": "Internet Sales - M-1"
      },
      {
        "database": "AdventureWorks2",
        "table": "Internet Sales",
        "partition": "Internet Sales - M-2"
      }
    ]
  }
}

As we can see above only three partitions have been refreshed and the nine other partitions have never been refreshed.

Now let’s see what the SQL approach will do:

As we can see above instead of processing only three partitions we process all the existing partitions.
However, it still produces the same result as we are dynamically filtering the partitions by joining the fact table on the dynamic view thus even if the partitions are processed they are still empty (0 rows).

conclusion

Dynamically partitioning tables in SSAS tabular can be a lot more complex than this approach especially when we need to handle partitions with different refresh frequencies or granularity but in scenarios where we need to refresh all the partitions, I found this technique very efficient and easy to use.

Here are the advantages/drawbacks of using this approach:

Advantages:
– This is straightforward to implement.
– Very easy to maintain through the param table.
– This is also quite flexible as long as we’re basing our partition on a date (day, month, quarter, semester, year…) any changes on the logic to calculate the partition size is done in a single place: the view.

Drawbacks:
– If we have a lot of partitions the SQL query will still be executed even if it returns 0 rows.
But with the right index on the right column, each query shouldn’t run for more than a few secs for each empty partition so not a big deal.
– This approach works very well for scenarios where we need to refresh all the partitions or set of partitions in each process, however, for more advanced partitioning strategy such as incremental refresh or refreshing only the partitions that have changed this approach is not suitable and we would need a more complex approach.

One thought on “Dynamically partitioning tables in SSAS Tabular model using SQL

Leave a Reply

Your email address will not be published. Required fields are marked *