Import multiple CSV files in R and load them all together in a single data frame

## Import multiple CSV files in R and load them all together in a single data frame

List of all the filenames One approach I found really straight forward is to create a list of all your filenames. You can also create a pattern to fetch your directory and returns all the matching files. In my example I need to read all the files starting with “FR”. The function lapply (equivalent of a loop) reads every single file presents in my list fileNames and store them into my variable zonnesFiles. The variable zonnesFiles is a list of…

PowerBI – Dynamic Chart Title

## PowerBI – Dynamic Chart Title

Unlike Qlikview, the chart titles in PowerBI can only be static. as you can only pass a static text in the title parameter. However, there’s a way around it! The workaround I found is pretty simple you just need to fake a title by creating a measure that contains your title expression and drop this measure into a Card visual. Then by applying the same transparency and colours of your chart you just need to turn off the chart tile…

For Loop vs Vectorization in R

## For Loop vs Vectorization in R

A brief comparison between for loop and vectorization in R A short post to illustrate how vectorization in R is much faster than using the common for loop. In this example I created two vectors a and b witch will take some random numbers. I’ll compute the sum of a and b using the for loop and the vectorization approach and then compare the execution time taken by both of the different methods. I’ll repeat this test 10 times with…

Central Limit Theorem -example using R

## Central Limit Theorem -example using R

The Central Limit Theorem is probably the most important theorem in statistics. In this post I’ll try to demystify the CLT with clear examples using R. The central limit theorem (CLT) states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the original population. Furthermore, the CLT states that as you increase the number of samples…

Coursera Data Science Specialization Review

## Coursera Data Science Specialization Review

“Ask the right questions, manipulate data sets, and create visualizations to communicate results.” “This Specialization covers the concepts and tools you’ll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.” The JHU Data Science Specialization is one of…

Human Resources Data Analytics

## Human Resources Data Analytics

Using predictive analytics to predict the leavers. The dataset contains the different variables below: Employee satisfaction level Last evaluation Number of projects Average monthly hours Time spent at the company Whether they have had a work accident Whether they have had a promotion in the last 5 years Department Salary Whether the employee has left *This dataset is simulated Download dataset By using the summary function we can obtain the descriptive statistic information of our dataset: Data preparation: Followed by the str function…

Populating Time Dimension

## Populating Time Dimension

A ready-made script that I have modified to create and populate a Kimball Time dimension. This script will create a time dimension and populate it with different levels of granularity: second, minute, hour.

Implement Linear Regression in R (single variable)

## Implement Linear Regression in R (single variable)

Linear regression is probably one of the most well known and used algorithms in  machine learning. In this post, I will discuss about how to implement linear regression step by step in R. Let’s first create our dataset in R that contains only one variable “x1” and the variable that we want to predict “y”. #Linear regression single […]

Stanford Machine Learning: Intro

## Stanford Machine Learning: Intro

I have decided to take part in the machine elarning courses provided by Stanford University. Now there are loads of MOOCs but this course was  one of the first programming MOOCs Coursera put online by Coursera and it is still ranked as first by Class Central. I have now almost completed the 11 weeks course and I can tell that Stanford Professor Andrew Ng is a brillant teacher, he is able to explain quite complicated algorithm in a very simple way. This course provides…