Introduction
Microsoft Power BI is a platform Microsoft provided that allows for self service BI work. It is a mistake to think this software simply generates reporting, the central aspect of Power BI is the ability to create a dimensional model representing various entities of a business process. It is a software I have some experience with and wish to use more. This blog post is my first demonstrating power BI and shows a very practical use of Power BI – monitoring ones personal finances, it shows all aspects of the process from data load, model creation, calculating measures, extending the model. this is an end to end walkthrough. Future Power BI posts are likely to be much smaller and focused.
Obtaining Data
I wanted to obtain a dataset showing transactions involved within an individuals bank accounts. Thankfully Kaggle has exactly this at the following page: https://www.kaggle.com/datasets/bukolafatunde/personal-finance. If you aren’t registered on Kaggle you can see the data in my github repository:
These aren’t real transactions but they are representative enough to show how this process could work genuine bank accounts. It is also a single file export, allowing me to show how dimensions can be built out of one file.
Identifying Dimensions
The file personal_transactions.csv contains various transactions over a 2 year period for 3 bank accounts of an individual. If I open the file in Excel I can see the following.
What is a Fact?
A fact is something which has happened, something which is true. In the case of Business Intelligence a fact tends to be something numeric, something to be counted, or summed or measured.
In the table above we have one fact – the transaction, the key element here is the Amount column – this can be measured. So in our model we will have one fact table – transactions
What are Dimensions
Dimensions are information that support a fact, metadata if you wish. So if our fact is a transaction, what supports it? what is the metadata around a transaction, One way of thinking about dimensions is the “who, what, where and why” of the facts.
Based on what we have in this file I would say there are 4 dimensions that I am interested in;
- Date – When the transaction took place
- Category – what was the transaction for
- Transaction Type – Was the an Expense or Income
- Account – What account made the transaction
therefore in our model we will create 4 dimension tables.
Creating the Model – Transforming with Power Query
Now we have an idea on the tables we wish to create, we need to undergo the process of actually importing the data and transforming it from one table to 5 (one fact table, four dimension tables). We then need to transform data and create appropriate relationships between these tables.
Thankfully Power BI is very good at this and has a query editor specifically for these tasks.
Create Dimensions and Keys
We know we want to have dimensions for Date, Category, Transaction Type and Account. Date is a special dimension and will be described later, but what we need to do for the other dimensions is consistent. We wish to take the following steps;
- Extract the distinct values from the transaction file for these columns into new tables.
- Create a key for each distinct value
- Write the correct key for the value back into the fact transactions table and remove the previous text column.
Lets look at Category and see how this works in the Query Editor. Firstly We Select Get Data and pick the csv file containing the transactions. Then click the Transform Data button. At this point you will see the Power Query Editor.
As we are making a new dimension for Category – Right click the Category column and select Add as new query.
This means we will have the main transaction table in one query and have the selected column (category) in another. We know we want to make a new table from this query so select the To Table option.
At this point we have instructed the Query Editor to import the category column as a new table. This was part of the first step to undertake, however we want distinct categories not values repeated. Luckily this is easily resolved. Right click the column and select Remove Duplicates.
We now see a distinct set of categories. Step one is complete.
We now want to make a unique key for each of the distinct categories. This can be done by selecting the Index Column option on the Add Columns tab;
We select the From 1 option and now each of the distinct Categories has a key.
The third step is to relate this key back to the original table. We want the transactions fact table to contain the key that corresponds to the relevant category each transaction was made against. For that we need to join the original table to this newly created one. Before we do this though – lets rename the fields in the Category dimension to something appropriate, perhaps Category and Category Key.
Select the original transactions table (as we want the key to exit in that table) and click the Merge Query button in the Home tab. You should see the following screen.
We want to create an inner join between these two tables based on the Category text. Complete the menu as below.
And click OK
Notice how we have this new column that has a value of “Table” – this is the result of the join. Click the silver button and you can see the fields selected.
we are only interested in the Category Key column. so pick that and click OK. At this point you should see the screen below
So we have create a new category dimension table, given it a key and related this key back to the original transactions table. we have factored out the category from the transaction table as a dimension. We just need to tidy up by removing the original category column (highlighted) as we no need this (we have the relation to the category dimension) and to rename the Category key in this table so it doesn’t include the Category.1. prefix.
This whole process is then repeated for Transaction Type and Account. At the end of this process you should see the original table now looks similar to below;
Tidying up – Transaction Type
While not entirely required I don’t like the terms Debit/Credit for this reporting. This is an individuals bank account and there is no notion of double entry bookkeeping for us to track. I edited this via a conditional column in the Transaction Type dimension as shown below;
So in my reporting debits will be known as expenditure and credits known as Income
End of the Query Editor
At this point I imported the tables into the model. I now have the normal power BI view (not the query editor) but I still have one more dimension table to create – the Dates table.
Create Dates Dimension
The dimension of time will exist in any reporting model and it is somewhat unique. What I want to create a table with dates covering all the transaction dates and to have their corresponding months and years so I can slice the data by different time values. This table can be scripted in DAX.
I create a new calculated table by selecting New Table on the modelling tab. I then enter the following DAX formula;
Dates = CALENDAR(
DATE ( YEAR( MIN(Transactions[TransactionDate])),1,1),
DATE ( YEAR ( MAX(Transactions[TransactionDate])), 12,31)
)
The Calendar function is to create a list of dates between a starting date and an end date. My starting and end dates are the 1st of January from the year of the first transaction in the Transactions Table and the 31st of December From the last. After executing this I have one column with this date range in it – one row per day
What makes the Date table so useful is its ability to aggregate over periods of time, from days to weeks to months to years. We have a date now but it would be much better to add more columns allowing us to perform more time base analysis. Lets Add columns using DAX as follows
Calculated Column Name | Dax Expression | Description |
Year | YEAR(Dates[Date]) | Extracts the year from the Date column |
Month Num | MONTH(Dates[Date]) | A numeric value that represents the month (1 for Jan, 12 for Dec) |
Month | FORMAT(Dates[Date],”MMM”) | A 3 character representation of the month in the date column e.g. Jan, Feb etc… |
DateSort | FORMAT(Dates[Date],”YYYYMMDD”) | A numeric version of the date in the format YYYYMMDD e.g. 20221201 this type of date format can be sorted easily as a number |
The Date dimension table should then look something similar to below
At this point all our dimensions are created and the Fact table (transactions) has an ability to relate to all of them.
Creating Relationships
In the relationship tab of Power BI we can then connect all our dimensions and fact tables together, creating a working dimensional model. connect the keys to the fact table, using the date as the key to the dates dimension, you should see something similar to below;
This is really good progress! we have a model!.
Calculations & Reporting
I know what you are thinking… this is a lot of work and so far no reporting! but trust me the effort we have put in here is well worth it. The model creation is by FAR the most important step in using power BI and now we have this completed this task you will see how powerful this is.
Firstly as best practice we want to create measures on the columns we will be aggerating, currently this is exclusively Amount from the Transactions Table. Lets make a new measure for Amount. The DAX is Amount=SUM(Transactions[Amount]). Ok now we have that lets make some reports.
Calculating Transaction Amounts Over time
So now we have our model – lets show why it is useful. If I want to know the Income I’ve had in 2009 how can I do that? – if you are thinking of writing a lot of code now, don’t.. the model will take care of it for us. Drag the year from the date dimension into a table, drag the transaction Type into a table and drag the measure of amount into a table;
What if we want this broken down by month – yes, you’ve guessed it add the month column from the date table (here I have sorted by amount desc)
So what our model allows us to do is to slice the data by the dimensions we have created. We can also use category to look at 2019 expenditure by category;
So you can really see the “power” (excuse the pun) of Power BI once yo have a model set up.
Initial Dashboard
The dashboard I created is shown below;
It uses two slicers as drop down menus to allow you to look at a particular year and specific account – so for example the screen above is 2019 for the Silver Card account. However if I pick the 2018 checking count all the data updates automatically.
And to some extent this dashboard tells a story of a year. On the line graph you can see that I was saving (above 0 income vs expenditure for this account) Until May. Well what happened in May? if you look at the table on the bottom right yo can see Mikes Construction Co was paid £8,000 in May, the largest single expense of the year. It relates to hoe improvements which were 19% of the yearly expenditure (the pie chart on the bottom left).
The headline figures are shown in the top right – This account has a current balance of 11.25k (assuming it initially started with 0 balance), in 2018 it had 52k income, 40.24k expense meaning it has “Savings” of 11.76k
The creation of these figures and graphs was simple, they just use the relationships of the model and very basic DAX calculations, with some filters applied.
Not bad for an afternoons work you say – but could we make this better?
Extending the Model
One of the real selling points of Power BI is that you can develop rapidly in an agile fashion building upon the previous developments.
When it comes to extending a model there really are two choices;
- Bring in more data and relate it
- Perform more calculations on the data (more measures)
To demonstrate how both of these work I set up another file called Budget.csv This file is very simple and for various months and categories within 2019 it dictates a budget value. It doesn’t care about what account is used, simply a monthly budget for some of the categories of expenditure.
The file in excel is shown below.
It gives us a budget figure for each category and month in 2019. We can extend the model by bringing in this new data, relating it to the transactions and reporting on it.
BACK TO THE QUERY EDITOR!
We import the file as before – mapping the category to the CategoryKey.
So now we have a way of relating the category to the budget – but that is not very useful unless we can apply it to the time period (it will just sum all the budget values for a given category) – we need a way of relating it to the Dates table as well. Once imported we can make a Calculated column to give us the 1st date of each month – e.g. 01/01/2019 for January 2019. Once we have this we could establish a relationship to the Dates table. We can create a calculated column with the following DAX BudgetDate = CONCATENATE(CONCATENATE(Budget[Year],Budget[MonthNumeric]),”01″)
Once we have this we can connect the budget table to the Category and Date dimensions;
Now we could write a few DAX measures to calculate the budget and the difference between that an the actual spend;
Measure Name | DAX | Description |
Budget | SUM(Budget[Budget]) | Sum of the Budget Values |
Budget Variance | [Budget] – [Amount] | The difference between budget and actual spend. |
Budget Variance Percentage | DIVIDE([Budget Variance],[Budget],0) | The Percentage difference between budget and actual spend. |
I created a second page of the dashboard to look at Budgets Vs Actual Spends;
This budget example shows how you can extend an existing model with relating data and use this to calculate more measures which hopefully provide greater insight.
Conclusion
What I was trying to show in this blog is how power BI works in a fairly simplistic way, obviously it can get quite compilated, the DAX expressions in particular. I hope to do future posts on more specific issues / other dashboards in the future, as I mentioned at the start those posts will be nowhere near as long as this, but if you’ve never used power BI before I think walking through the entire process of modelling offers a lot of background into how the software works and how its a lot more advanced that simply coding a report.
I hope that this article has shown with a little investment up front transforming and modelling the data you yield massive benefits downstream. The modelling/Calculating stage of power BI is 80% of the work, however most business facing users often fail to appreciate this and think the work is mainly done in producing graphs, this in my opinion isn’t necessarily the case.
I have a power BI repo on GitHub Containing this example. (PowerBI (github.com)) I hope to add more in the future.