In this post, I am going to share a step by step guide to implement AB Testing with Power BI.
A/B testing, or split testing, is a digital marketing technique that involves comparing two versions of a web page or application to see which performs better. These two versions, A and B are randomly shown to different visitors at the same time. Then a statistical analysis of the results determines which version performs better according to certain predefined indicators such as conversion rate.
Goals of this Post
- What is AB testing and when to use it
- Why using Power Bi instead of R/Python
- Implement AB testing with Power BI
- Use some of the advanced DAX Statistical functions
What is A/B testing?
A/B testing is a basic approach to compare two versions of something to figure out which performs better. Even though A/B testing can be used for any experiments it is often used in the digital marketing world (websites, apps, email campaigns…)
So to keep it short and simple A/B tests is the application of statistical hypothesis testing which consist of a randomized experiment with two variants, A and B.
Typically in A/B testing, the variant that gives higher conversions is the winning one, and that variant can help optimize a website for better results. The metrics for conversion are unique to each website.
For eCommerce, it may be the sale of the products, while for B2B, it may be the generation of qualified leads.
For example, an online retail company wants to test the “Hypothesis” that if making the “Add to Cart” button brighter will lead to an increase in the number of people adding items to their shopping carts.
By having the two websites active at once and randomly directing users to one or the other, the online retail company can monitor the impact of making the “Add to Cart” button brighter and then draw conclusions accordingly.
AB Testing with Power BI
Why using Power Bi instead of R/Python?
The main reason I decided to write this post is that I couldn’t find any post on how to implement AB testing with Power Bi with DAX so I thought it would be interesting to try out the advanced statistical functions of DAX.
Obviously, Power Bi is not R, Python or SPSS but I file like the built-in DAX Statistical Functions are just amazing and often underlooked.
So through this post, I want to share the hidden or little known great capabilities of the Power BI statistical functions.
There are also good reasons why we could implement AB Testing with Power BI:
- No python or R knowledge within your company
- Security Policy: Your organisation does not allow r/python script to be deployed in the PowerBi portal
- Integration and deployment more complex
- One tool (no need to invest in a statistical tool)
- One language (easier to maintain)
- Fully interactive visuals (custom visuals using R or Python are not)
- It’s fun 🙂
Create AB testing with Power BI
Before jumping into the advanced Dax Statistical functions let’s see what are the data for which we want to run an A/B testing test.
- One month of data
- Data are split into two groups (Control and Var)
- Over one month there were 6,178 visits and 155 leads disregarding the group assignments
Now let’s visualise the data summarized by groups.
|Ctrl Group||Var Group|
The variation group seems to perform better as its conversion rate of 2.86% is higher than the conversion rate of the control group.
But how can we be confident that this improvement is the result of the changes we made and not a result of random chance?
Let’s now jump into the statistic functions and answer this question.
Advanced DAX Statistical functions
The conversion rate or success rate is the percentage of users who take the desired action. For example, in online retail, the conversion rate is the percentage of website visitors who buy something on the site.
-- If we use a flag 0/1 for lead the avg of 0/1 give us the avg conversion% = AVERAGE(ConversionRateABTesting[IsConvert]) --if we don't use a flag we can simply compute nb rows/nb leads conversion% = Divide(nb leads,nb visits)
[Ctrl Group Conversion %]
--to simplify the next formulas I also compute the conversion rate for each group Ctrl Group Conversion % = CALCULATE ( [conversion%], ConversionRateABTesting[test_assignment] = "Control Group" )
[Var Group Conversion %]
Var Group Conversion % = CALCULATE ( [conversion%], ConversionRateABTesting[test_assignment] = "Var Group" )
A Confidence Interval or CI is a range of values we are fairly sure our true value lies in.
In another word, the CI can answer the question of whether the result of our experimentation is the result of the changes we made or the result of random chance.
The confidence level should be chosen before examining the data, a 95% confidence level is usually used. However, confidence levels of 90% and 99% are also often used in analysis.
As you will see below I’ve created a z-value table that allows me to choose a CI between 90% and 99%.
The Standard Error is needed to compute the Confidence Interval but what is Standard Error or SE?
If we were to repeat the same AB tests multiple times, we will notice that the conversion rate will vary.
So we use the “Standard Error” to calculate the range of possible conversion values for a specific variation.
For a given conversion rate (CR) and a number of trials (n), the standard error is calculated as:
std_Error = SQRT ( ( [conversion%] * ( 1 - [conversion%] ) / [Nb Visits] ) )
In order to simplify the next formulas, I compute the SE for each group (Control and Variation)
Ctrl SE = CALCULATE ( [std_Error], ConversionRateABTesting[test_assignment] = "Control Group" )
VAR SE = CALCULATE ( [std_Error], ConversionRateABTesting[test_assignment] = "Var Group" )
Conversion Rate Upper and Lower Limit
Since I’ve calculated the Standard Error I can now compute the upper limit and lower limit of the conversion rate based on a specific confidence interval defined by the Z-value.
Conversion Rate Lower Limit
Conversion Rate Min = [conversion%] - [std_Error] * FIRSTNONBLANK ( 'CI'[z value], 1 )
Conversion Rate Upper Limit
Conversion Rate Max = [conversion%] + [std_Error] * FIRSTNONBLANK ( 'CI'[z value], 1 )
Below is a table that contains the different values for Z and Alpha.
Youn can find more information about Z-table here.
To keep it short the Z-table represents the values of the cumulative distribution function of the normal distribution. It is used to find the probability that a statistic is observed below, above, or between values on the Z distribution or standard normal distribution.
|Confidence Interval||Z value||Hypothesis||Alpha|
For example, in statistics, 1.96 is the approximate value of the 97.5 percentile of the Z distribution which means that 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean.
A Z-score is a number that describes a value’s relationship to the mean of a group of values.
Z-score is measured in terms of standard deviations away from the mean.
Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.
The Z-score formula is as follows:
Z Score = VAR ctrl_Conv = [Ctrl Group Conversion %] VAR test_Conv = [Var Group Conversion %] VAR ctrl_SE = [Ctrl SE] VAR test_SE = [Var SE] RETURN ( test_Conv - ctrl_Conv ) / SQRT ( POWER ( ctrl_SE, 2 ) + POWER ( test_SE, 2 ) )
Uplift is simply the relative increase in conversion rate between Group A and Group B. It is possible to have negative uplift if our Control group is more effective than the new Var group. Here is the formula:
Var vs Ctrl % Diff = DIVIDE ( [Var Group Conversion %], [Ctrl Group COnversion %] ) - 1
- Var Group Conversion % = 2.864%
- Ctrl Group Conversion % = 2.141%
- Uplift = 2.864/2.141-1=33.79%
- (The Var group is 33.79% more effective than the Ctrl group)
Now using the Standard Error measure and Z-value we can also compute the minum and maximum expected Uplift within a specific level of confidence.
Var vs Ctrl % Diff Max = VAR _Z = FIRSTNONBLANK ( CI[z value], 1 ) RETURN DIVIDE ( [Var Group Conversion %] + [Var SE] * _Z, [Ctrl Group COnversion %] ) - 1
Var vs Ctrl % Diff Min = VAR _Z = FIRSTNONBLANK ( CI[z value], 1 ) RETURN DIVIDE ( [Var Group Conversion %] - [Var SE] * _Z, [Ctrl Group COnversion %] ) - 1
A p-value is used in hypothesis testing to either support or reject the null hypothesis. The p-value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.
In our scenario depending on the type of the test one or two-tailed the Null hypothesis states:
The two-tailed test takes as a null hypothesis that both groups have equal conversion rates.
The one-tailed test takes as a null hypothesis that the variation group is not better than the control group, but can actually be worse.
Here is the formula to calculate the p-value based on the Confidence Interval selected (Z-score):
P-value = 1 - NORM.DIST (ABS ( [Z Score]), 0, 1, TRUE )
Power or “1 -Beta“, is used to explain the strength of our test to detect an actual difference in the Variation group.
Conversely, Beta is the probability that our test does not reject the null hypothesis when it should actually reject it. The higher the power, the lower the probability of a Type II error.
Experiments are usually set at a power level of 80%, or 20% Beta.
Here is the formula to calculate the Power:
Power = VAR _Z = FIRSTNONBLANK ( CI[z value], 1 ) RETURN IF ( [Var Group Conversion %] > [Ctrl Group COnversion %] || FIRSTNONBLANK ( Hypothesis[Tail], 1 ) = "One-Sided", 1 - NORM.DIST ( ( ( [Ctrl Group COnversion %] + [Ctrl SE] * _Z - [Var Group Conversion %] ) / [Var SE] ), 0, 1, TRUE ), 1 - NORM.DIST ( ( ( [Var Group Conversion %] + [Var SE] * _Z - [Ctrl Group COnversion %] ) / [Ctrl SE] ), 0, 1, TRUE ) )
AB Testing in Power BI
Now that we have defined all the measures let’s plug everything together.
Here is the final AB Testing with Power Bi report.
Line Chart – The number of visits Control vs Var
Line Chart – The number of orders Control vs Var
Confidence Interval and Test type Slicers & Uplift and P-value KPIs
To get the CI and Hypothesis data populated I just entered the data manually (see Z-score section)
Table – Z-score, Uplift Lower/Upper, P-value and Power
Here I use CI=95% for a Two-Tailed test so to reject the null hypothesis the p-value must be under 0.025. (1-0.95)/2.
As 0.034 is greater than 0.025 we fail to reject the null hypothesis thus we assume that the increase in 33% of the uplift is due to the chance.
Note: Instead of dividing alpha by 2 we can also multiply by 2 the P-values as the P-value for a two-tailed test is always two times the P-value obtained for the one-tailed tests.
Table – Conversion Rate with Upper and Lower limit for the two groups
Bar Chart – Conversion Rate with Upper/Lower limit for the two groups
Text – Final Result of the Ab testing experiment.
This post is a shallow introduction to AB testing with power BI. It is, of course, impossible to cover all the aspects of AB testing in just a post as there are entire books written on this subject with more advanced concepts which are beyond my knowledge.
However, I really wanted to show that we can easily implement AB testing using Power BI and create the exact same metrics that most of the AB testing tools on the market would provide.
The only requirement would be to plug Power BI onto the data generated by your websites (visits, conversion) and that’s it!
As said above, I could’ve integrated an R or python script in Power BI to get this done but it wouldn’t have added any real value to Power BI.
By using DAX only it really shows that Power BI has great statistical analysis capabilities that I feel are little known.
As I love statistics as much as I love Power BI I hope to write more posts about “Statistical concepts using Dax only “.
If you’d like to implement AB testing with PowerBI and test your result I’d recommend using the following AB Testing Calculator simulator:
Here is the full PBI report:
Let me know in the comments if you have any questions.