Paired T-test in Power BI using DAX

Paired T-test in Power BI using DAX

In this post, I will describe how we can implement a paired t-test in Power BI using DAX only.

What is T-Test

A t-test is a type of inferential statistic that can be used to determine if the means of two groups of data are significantly different from each other.

In other words, it tells us if the differences in means could have happened by chance.

There are three types of t-test:

In this post, we will focus only on paired t-tests and I’ll be soon writing another post for the other types of t-test.

A paired t-test is used to compare two population means where we have two samples in which observations in one sample can be paired with observations in the other sample. We compare the two sample means at different times or under different conditions.

Paired t-test Real Life Examples

Examples of where we can use paired t-test:

  • Before-and-after: Observations on the same students’ diagnostic testresults before and after a particular module or course
  • Medicine: Difference in cholesterol level before and after treatment, the difference in blood pressure before and after treatment.
  • Social research: Determine whether there is a significant change in the scores of the same cases on the same variables over time such as % turnout in presidential elections by states

Paired t-test assumptions

  • Independence of the observations: Measurements for one subject do not affect measurements for any other subject
  • Each of the paired measurements must be obtained from the same subject
  • The differences between pairs are normally distributed

Implement Paired T-test in Power BI using DAX

In this section, I will break down every single step on how to implement a Paired T-test in Power BI from t-stat, p-value, standard error, confidence interval and critical value. I will explain the role of each statistical measure what they are used for and how to calculate them using DAX only.

This image has an empty alt attribute; its file name is image-2.png
  • “d-bar” is the average difference between paired data
  • “SE” is the standard error of “d-bar” (we’re going to cover it further down)
  • “: (delta greek letter) – since we’re using paired data sample delta is equal to zero (we’ll cover it in the null hypothesis section)

Why using Power BI with DAX instead of R/Python

  • Fully interactive visuals; custom visuals using R or Python are not
  • Security policy within your organisation: Limitation to deploy python or R script to the portal
  • Force you to better understand the test statistics process
  • Easier to maintain: one language, one tool, integration and deployment easier, reusability
  • Only DAX and some statistics knowledge required

The data

The dataset contains data about the US presidential election at a county level with the percentage of votes that went to Republican candidates in percentage in 20212 and 2016. (500 rows).

I downloaded this dataset from Datacamp but it is also publically available at https://dataverse.harvard.edu/dataverse/.
I chose this dataset because I needed a dataset with enough rows to make it easier to visualize the distribution.

The model

The model contains the main datasets that I described above and two parameters table that I will describe later.

Hypotheses

Here we want to compare and test the two paired samples (2012 vs 2016) and we can make three different hypotheses:

  • Two-tailed: Is there any difference in means between the % of votes given to the Republican candidates between 2012 and 2016?
    • H0=μ2012 – μ2016 = 0
    • Ha =μ2012 – μ2016 <> 0
  • Left-tailed: Was the % of votes given to the Republican candidates lower in 2012 compared to 2016?
    • H0=μ2012 – μ2016 >= 0
    • Ha =μ2012 – μ2016 < 0
  • Right-tailed: Was the % of votes given to the Republican candidates greater in 2012 compared to 2016?
    • H0=μ2012 – μ2016 <= 0
    • Ha =μ2012 – μ2016 > 0

We will cover the null hypothesis “H0” and the alternative hypothesis “Ha later in this post.

Calculate the difference in means

The first step to implementing our paired t-test in Power BI is to compare the two paired samples is of course to calculate their difference in means.

So after calculating the difference between the two variables “dem_percent_12” and “dem_percent_16” we can simply calculate the mean using the average Dax function.

diff 2012 vs 2016 = dem_county_pres[dem_percent_12]-dem_county_pres[dem_percent_16]
mean_diff = AVERAGE(dem_county_pres[diff 2012 vs 2016])

Sample Standard Deviation

The standard deviation (sd) is a measure of how spread out values are. A small standard deviation indicates that the values tend to be close to the mean, while a large standard deviation indicates that the values are spread out over a wider range. We will use the SD measure to calculate the Standard Error.

Luckily there’s a built-in Dax function for the standard deviation, here we’re using the sample standard deviation formula since we’re working with a sample instead of a whole population.
The only difference between the sample sd and the sd formula is the denominator “n-1” for the sample sd instead of “n” for the sd. (the larger the sample is the closer the result of the two formulas will be)

sd_diff = STDEV.S(dem_county_pres[diff 2012 vs 2016])

Standard Error of the mean

To put it simply the Standard Error SE or SEM in our case is the estimated standard deviation of the sample mean

Its formula is the standard deviation (calculated above) divided by the square root of the sample size.

The difference between the SD and the SEM is that the standard deviation measures the dispersion from the individual values to the mean, while the Standard Error of the mean measures how far the sample mean of the data is likely to be from the true population mean.

SEM = 
var __sd=[sd_diff]
var __n=[Size]
return
divide(__sd,sqrt(__n))

t-statistic

The t-statistic (also called t-value or t-score) is used in a t-test to determine whether to support or reject the null hypothesis

The larger the t-value is, the more likely the difference in means between the two samples will be statistically significant.
In order to support or reject the null hypothesis, we need to compare the t-stats result with the t-critical value given by the t-distribution table.

t_stat = 
var __meandiff= [mean_diff]
var __sddiff= [sd_diff]
var __n=[Size]
var __parammudiff=0
var __se=[SE]
return
divide(__meandiff-__parammudiff,__se)

t-critical value

The t-critical value is the cutoff between retaining or rejecting the null hypothesis. If the t-statistic value is greater than the t-critical, meaning that it is beyond it on the x-axis, then the null hypothesis is rejected and the alternate hypothesis is accepted.

How to calculate the t-critical value?
Without a computer calculating the critical requires the use of the t-distribution table.

  • step 1: Calculate the degree of freedom df –> sample size -1 (15-1 =14 for the example above)
  • step 2: Choose the alpha level, alpha level is the threshold value used to judge whether a test statistic is statistically significant we often use 0.05 (95% of confidence) but it can vary according to the domain area (we used 0.05 in the abox example)
  • step 3: Choose either the one or two-tailed distribution
    • one tail:
      • left-tailed: difference in means between the paired samples is strictly lower than 0
      • right-tailed: difference in means between the paired samples is strictly greater than 0 (example above)
    • two-tailed: difference in means between the paired samples is not equal to 0 (greater or lower but not equal)
  • step 4: lookup for the df, alpha level and the one-tailed/tow-tailed intersection in the grid

Luckily, we don’t need to import the t-distribution table in Power BI and do the lookup ourselves since we can use the built-in DAX function T.INV for on tailed T-test and for a T.INV.2 T-test. The two parameters that we need to pass to the function are the “probability” and the degree of freedom.

I manually entered some parameters into a table called “CI” to dynamically run different Paired T-test in Power BI. The Critical Value parameter is not needed since we can compute it using the two functions mentioned but I like to keep it to quickly refer to it if needed.

The above table is linked to the table “Hypothesis” which contains the hypothesis tail that we want to use for the test.

Here is the formula to calculate the critical value using DAX and dynamically interact with the parameters (probability and on-tailed/two-tailed)

t_val = 
var __df=[degree of freeedom]
return
switch(TRUE(),        
       FIRSTNONBLANK(Hypothesis[Tail],1)="Left-Tailed",abs(T.INV(FIRSTNONBLANK(CI[Probability],1),__df)),
       FIRSTNONBLANK(Hypothesis[Tail],1)="Right-Tailed",abs(T.INV(FIRSTNONBLANK(CI[Probability],1),__df)),
       T.INV.2T(1-FIRSTNONBLANK(CI[Probability],1),__df)
       )

Left critical region
We have now calculated the t-critical value we just need to add its value to the mean of the difference of our two samples “dbar”.

For a left-tailed test, we will only be looking into the left-critical region so to reject the null hypothesis the t-stat must lie to the left of the critical value in other words it should be lower than the left critical value.

For a two-tailed test, the T-stats should either be lower than the Left-critical value or greater than the right-critical value.
The formula for the left critical value is: mean_diff – critical value

left cr = 
var __crit = [t_val]
var __mudiff=[mean_diff]
return
switch(TRUE(),        
       FIRSTNONBLANK(Hypothesis[Tail],1)="Left-Tailed",__mudiff-__crit,
       FIRSTNONBLANK(Hypothesis[Tail],1)="Right-Tailed",BLANK(),
       __mudiff-__crit
       )

Right critical region
As for the right-tailed test, the T-stat must be greater than the right-critical value to reject the null hypothesis.
The formula for the right-critical value is: mean_diff + critical value

right cr = 
var __crit = [t_val]
var __mudiff=[mean_diff]
return
switch(TRUE(),        
       FIRSTNONBLANK(Hypothesis[Tail],1)="Left-Tailed",BLANK(),
       FIRSTNONBLANK(Hypothesis[Tail],1)="Right-Tailed",__mudiff+__crit,
       __mudiff+__crit
       )

P-value

A p-value is used in hypothesis testing to help us support or reject the null hypothesis. The p-value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that we should reject the null hypothesis.

As we already know from the critical value section, the critical value is a point beyond which we can reject the null hypothesis. P-value on the other hand is defined as the probability that an observed difference could have occurred just by random chance. The benefit of using a p-value is that we can test the estimated probability at any desired level of significance by comparing this probability with the significance level “Alpha” without needing to recalculate the critical value each time.

To sum it up they both do the same thing: helping us to support or reject the null hypothesis in a test. They are two different approaches to the same result.

I personally tend to always use the p-value since I find it easier to calculate and interpret. (e.g. with a p-value of 0.06 we may fail to reject the null hypothesis however we can still observe moderate evidence)

The built-in DAX functions to calculate the p-value are T.DIST for the left-tailed test, T.DIST.RT for the right-tailed test and T.DIST.2T for the two-tailed test. These functions require two parameters the t-stat and the degree of freedom.

And here is the formula to dynamically interact with the different parameters.

p-value = 
var __df=[degree of freeedom]
var __t_stat=[t_stat]
return
SWITCH(TRUE(),
        FIRSTNONBLANK(Hypothesis[Tail],1)="Left-Tailed",T.DIST(__t_stat,__df,TRUE()),
        FIRSTNONBLANK(Hypothesis[Tail],1)="Right-Tailed",T.DIST.RT(__t_stat,__df),
        T.DIST.2T(abs(__t_stat),__df)
)

Confidence Interval

Formula for t interval | Confidence interval, Change management, Mathematics

Confidence Interval or CI is a range of values we are fairly sure our true value lies in.
In another word, the CI can answer the question of whether the result of our test is due to a chance or not within a certain degree of confidence.
The confidence level should be chosen before examining the data, a 95% confidence level is usually used. However, confidence levels of 90% and 99% are also often used depending on the domain area.

Note that a one-tailed confidence interval always extends from minus infinity to some value above the observed effect, or from some value below the observed effect to plus infinity.

Lower

Here is the DAX formula to calculate the CI Lower limit and dynamically interact with the different parameters.

lower = 
var __meandiff=[mean_diff]
var __t=[t_val]
var __sd=[sd_diff]
var __n=[size]
return
switch(TRUE(),
        FIRSTNONBLANK(Hypothesis[Tail],1)="Left-Tailed","-inf",
        FIRSTNONBLANK(Hypothesis[Tail],1)="Right-Tailed",__meandiff-__t*divide(__sd,sqrt(__n)),
        __meandiff-__t*divide(__sd,sqrt(__n))
)

Upper

And her is the Dax formula for the Upper limit. For a right-tailed the

upper = 
var __meandiff=[mean_diff]
var __t=[t_val]
var __sd=[sd_diff]
var __n=[Size]
return
switch(TRUE(),
        FIRSTNONBLANK(Hypothesis[Tail],1)="Left-Tailed",__meandiff+__t*divide(__sd,sqrt(__n)),
        FIRSTNONBLANK(Hypothesis[Tail],1)="Right-Tailed","+inf",
        __meandiff+__t*divide(__sd,sqrt(__n))
)

Result

Before visualising the outcome of our paired t-test in Power BI let’s define the null and alternative hypotheses.

The null hypothesis and alternate hypothesis

The null hypothesis H0 assumes that any difference between the two paired samples is due to chance.

  • For a two-tailed test, the null hypothesis assumes the difference in means is equal to 0
  • For a left-tailed test the null hypothesis assumes the difference in means is not lower than 0
  • For a right-tailed test the null hypothesis assumes the difference in means is not greater than 0

As for the alternate hypothesis Ha, it is simply the direct contradiction of the null hypothesis.

Here is the DAX measure to display the text result of our paired t-test:

Result = 
var __lessAlt="The true difference in means is less than 0"
var __greaterAlt="The true difference in means is greater than 0"
var __twosidedAlt="The true difference in means is not equal to 0"
var __lessNull="The true difference in means is not less than 0"
var __greaterNull="The true difference in means is not greater than 0"
var __twosidedNull="There's no true difference in the means"
var __alpha=FIRSTNONBLANK(CI[Alpha],1)
var __Pval=[p-value]
return
switch(TRUE(),        
       FIRSTNONBLANK(Hypothesis[Tail],1)="Left-Tailed",
        if(__Pval<__alpha,"We reject the nUll hypothesis and we accept the alternative hypothesis: " & __lessAlt, "We fail to reject the null hypothesis: " & __lessNull),
       FIRSTNONBLANK(Hypothesis[Tail],1)="Right-Tailed",
        if(__Pval<__alpha,"We reject the null hypothesis and we accept the alternative hypothesis: " & __greaterAlt, "We fail to reject the null hypothesis: " & __greaterNull),
       if(__Pval<__alpha,"We reject the null hypothesis and we accept the alternative hypothesis: " & __twosidedAlt, "We fail to reject the null hypothesis: " & __twosidedNull)
       )

Visualize the result: two-tailed test

Here we’re using a CR of 95% (or alpha 0.05).
As we can see the t-stat “30.30” is much greater than the critical value “8.79” or if we use the p-value approach we can see that the p-value is extremely low and far below the alpha significant level so we reject the null hypothesis in favour of the alternative hypothesis.

Paired T-test in Power BI

Visualize the result: Right-tailed test

This time we’re running the paired t-test in Power BI with a 99% confidence interval and for a right-tailed.
In other words, we want to observe if the difference in the means is greater than 0.
Since we’re using a right-tailed test we’re using only the right critical value so the t-test must be greater than the right critical value which is the case so we reject the null hypothesis. (p-value is also below the alpha significant level)

As for the Confidence Interval, we can say that we’re 99% confident that the true difference in means should lie between 6.30 and +infinite.

Visualize the result: Left-tailed test

Times to run a left-tailed test… Can we reject the null hypothesis?
Of course not! The t-stat “30.30” is far greater than the left critical value “5.18” and the p-value is extremely large “1” so without any doubt we fail to reject the null hypothesis.

R output

To make sure that I correctly implemented the paired t-test in Power BI I added and displayed the result of the R “t.test” function and all results were accurate thanks to the great built-in DAX functions supported by Power BI. I hope that the PBI team will add even more statistical functions in the future.

It seems that R Visuals do not render from time to time when using Publish to the web but if you click on “Focus mode” then “Back to report” it eventually appears after a few seconds. Another reason to not use R and stick to DAX 🙂

Conclusion

This post only covered the implementation of the paired T-test in Power BI but as we saw there are three types of t-test so I should soon post the implementation of the other t-test.

Also, there are a few things that I did not cover on purpose since I did not want this post to become too statistics-heavy but rather focus on the DAX implementation side, however, one thing that I need to mention is that I by default used the Welch’s t-test (R used it by default as well), there’s another t-test called student t-test (its formula is slightly different) and it requires some extra assumptions on the group size and the variance of the two groups which make it less likely to be used. (I may cover it in a future post but here is an interesting post that talks about the differences).

I previously wrote a post about AB testing using Power BI using DAX only so in this other DAX statistics post I wanted to show that implementing a paired t-test in Power BI using DAX only is perfectly feasible it just requires a bit more work than typing two lines of code in R but with the help of some bulti-in DAX functions the result is great.

Final Note

If you’d like to implement your own paired t-test in Power BI and test your result I’d recommend using the following t-test calculator: http://www.sthda.com/english/rsthda/paired-t-test.php

The Power BI

Here is the “Paired t-test in Power BI” published to the web where we can interact with the parameters “Confidence Interval” and “hypothesis”

18 thoughts on “Paired T-test in Power BI using DAX

  1. Hello,

    Firstly, thank you for your well explained post!! Well done!!
    Could you share the file in powerBI?

    I am wondering if the R section is not running because of my root directory or if it’s because I am not super user in this windows machine.

    Best regards,
    Petter

  2. Hi Ben,

    Thanks a million for this. It is however possible to have a measure of average that is correctly applied when you do a state slizer.

  3. Thank you for the article, Ben!
    It seems like you’ve missed a column name a few times:

    should be AVERAGE(dem_county_pres[diff 2012 vs 2016])
    while you have:

    dem_county_pres[dem_percent_16] mean_diff = AVERAGE(dem_county_pres)

    1. Hi Andrii,
      Thanks for your comment!
      As for the column I’m not sure to see where the column name is missing the column mean_diff as the following formula “mean_diff = AVERAGE(dem_county_pres[diff 2012 vs 2016])”

  4. Well, that’s just mind blowing! Thank you so much for the amazing quality work you do! I am currently struggling to implement more advanced statistics in my Power BI dashboards and I found you blog really helpful. I think that statistics should be used much often in daily work of data analysts.

    1. Hey Tomasz,
      thanks for your comment Yes fully agree anyone who needs to go deeper into data analytics will need to use more advanced statistics

  5. This article and your blog are so helpful! I’m really hoping to implement this for my organization. We have before and after timepoints for each client that I want to run a t test on. I’m wondering, does using the Welch’s t test you use here cover this scenario? I’ve seen that Welch’s test is often used for independent samples rather than paired ones but I’m unsure if that matters here. Thanks so much!

    1. Hi Emily,
      thanks for your comment!
      You’re right about the usage of independent sample t-test but in your scenario you can stick to the paired t-test because you are comparing the exact same group but at different times this is known as repeated measure t-Test which uses the paired t-test formula.
      As long as the group that you test is the same and the dependent variable is also the same you’re good to go with the paired t-test 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *