Data Cleaning with Power Query
Share:
Power BI has a rich set of tools for data manipulation and cleaning. One of these tools is Power Query, that makes the process easier and efficient. In Power Query, you can perform a broad range of data transformations that help to improve the quality of your data. Some common tasks include removing null or duplicate values, standardizing data, splitting columns, and others. This chapter will guide you on how to use Power Query for cleaning data in Power BI.
Let's start with an overview of Power Query. Power Query is a data connectivity and data preparation technology that enables you to discover, connect, combine, and refine data across a wide variety of sources. With Power Query, you can clean, reshape, and transform data in an intuitive and straightforward way.
So, let's get started and see how Power Query helps us in cleaning data with some examples:
Importing data
The first step in data cleaning is to import data into Power BI. However, before importing, it would help to have a basic understanding of the structure and quality of your data. For this tutorial, let's assume we are using a dataset containing information about various movies.
// Here is how we import data from an Excel file:
In Power BI Desktop, click on Home > External data > Excel.
Navigate to the location of your file and select it.
Choose the relevant sheets of the Excel file.
Click on Load.
Removing Null or Blank Rows
In our movie dataset, we have some rows where some or all columns have null or blank values. Such rows can distort our analysis as these do not represent meaningful data. Power Query has a user-friendly interface that allows us to remove rows with null or blank values.
// Here is how we remove null or blank rows:
On the Power Query Editor,
Click Home > Remove Rows > Remove Blank Rows, and voila! The null or blank rows are deleted.
Removing Duplicates
Our dataset might have duplicate records, which can skew our analysis by exaggerating the impact of these records. Power Query has a feature to identify and remove these duplicates.
// Here is how we remove duplicate rows:
On the Power Query Editor,
Click Home > Remove Rows > Remove Duplicates, and the duplicate rows will be removed.
Splitting Columns
In our dataset, we have a column called 'name and genre' where both the movie's name and genre are combined into one string. This makes the data analysis difficult. But the good news is Power Query has a split column feature that can easily split this column into two columns: 'name' and 'genre'.
// Here is how we can split columns:
Select the 'name and genre' column.
Click the Transform tab > Text Column group > Split Column.
Choose By Delimiter.
In the pop-up window, choose Space as delimiter.
Click OK.
Now, two columns will be created, namely 'name' and 'genre'.
Renaming Columns
The default column names may not be straightforward, self-explanatory, or aligned with our naming conventions. It will help to rename the column names to more meaningful ones.
// Here is how we rename columns:
Right-click on the column header.
Click Rename.
Enter the new name and hit Enter.
Standardizing Data
The data in some columns might not follow a consistent format. For example, in our 'genre' column, some entries are in lower case, while some are in title case (first letter of the word is capitalized). It would help to have a consistent format for ease of analysis.
// Here is how we transform the case of text:
On the Transform tab,
Click Format.
Choose 'Lowercase'.
After these operations, remember to click Close & Apply
in Power Query to keep the changes and load the data back into Power BI.
In conclusion, successful data analysis involves careful data preparation and cleaning. Power BI’s Power Query offers an intuitive interface and powerful features that make the data cleaning process efficient and effective. By investing time in cleaning your data, you can guarantee the effectiveness and accuracy of your subsequent analysis and gain better insights from your dataset. Armed with the knowledge from this tutorial, you are well equipped to tackle any data cleaning challenge! Happy analyzing!
0 Comment
Sign up or Log in to leave a comment