I previously wrote a post on how to dynamically partitioning tables in SSAS tabular using SQL, but in this short post, I will focus on the performance side of partitioning.
What is Partition
Partitions split a table into logical partition objects. Each partition contains a portion of the data, partitions can be processed in parallel independent of other partitions or excluded from processing operations if they don’t need to be refreshed.
Why partitioning improves the refresh performance
The main reason for partitioning to improve the refresh performance of a model is due to parallelism.
Parallelism allows us to process multiple subsets of the same table in parallel and thus process a table faster.
When to partition
As a rule of thumb, I usually partition tables with more than 15M rows but if the refresh of your model is still fast enough there may be no need for partitioning yet.
By default, in SSAS the data are stored in segments where each segment has by default 8M rows and the first segment can store up to twice the default size (16M rows by default) so this where my rule of thumb is coming from. You can read more about segment and partitions here: Understanding segmentation and partitioning
Partitioning Dos and Don’ts
- Partitions should be equally distributed as much as possible (time columns are usually good candidates for partitions)
- Underlying tables should be indexed on the column used to filter the partition
- Avoid over partitioning your model as it could lead to the opposite intended effect
- Refresh time will increase as the engine will spend more time aggregating each partition together
- The memory size of your model increases if the partitions are too small (lower than the default segment size)
- If your model is very large it’s important to test different number of partitions
- Avoid under partitioning your model
- Give a meaningful name to each partition, I usually name my partitions “table name – period”(Sales CY, Sales Y-1…)
- Refresh only the partitions that need to be refreshed (historical data may not have changed)
Parralelism in action
Phil Seamark has recently written a post on how to Visualise your Power BI Refresh and of course we can apply the same technique of his excellent post to a Tabular Model. (You can also download the below report from Phil’s blog post).
In this example, the model that I’m going to process contains a fact table of about 110M rows and a small dimension table of 2.5k rows.
Model without Partitions – Full process
In the first scenario, we notice that there’s no parallelism occurring during the process of the fact table that’s of course because there are no partitions on this table and as this table is large enough we should partition it.
Model with Partitions – Full process
In the second scenario, I have partitioned the fact table into 12 partitions, I will certainly not partition the dimension table as it is way too small to be partitioned and I’ve run again a process full of the model.
And this time parallelism kicks in!
Parralelism and performance
As seen above parallelism helps the model to be processed faster, the duration time has been reduced by more than 30% we can also observe that the total computation time has increased “Total CPU Time” so what does this mean?
It means that to process the model faster more CPU resources were needed than in the sequential process.
Thus the number of cores available or QPU (for Azure AS) will have a significant impact on the performance of your model. You can have a lot of partitions but if your Server or Azure AS Tier does not have enough computation power you will still be limited by the hardware.
Partitioning and parallelism can significantly improve the processing time of your model, however, it is important to always test a different number of partitions and design your own rules depending on your own criteria. (cost, acceptable processing time, near real-time, complexity of your model, etc)
Also during the example that I’ve described in this post, nothing else was running on my machine at the time of processing my model so it is also crucial to pay attention to the other things happening on your server such as other models being refreshed, heavy running queries etc…