Dynamically removing top and bottom rows in Power Query can significantly streamline your data cleaning process when dealing with inconsistent data imports. This technique allows you to automatically eliminate unnecessary header rows or footer information based on specific conditions rather than fixed row counts, making your data transformation process more robust and adaptable to changing source files.
The ability to use conditions rather than static numbers is particularly valuable when working with regularly updated reports that may contain varying amounts of metadata or summary information.
In my video, I demonstrate how to implement this dynamic row removal technique that I learned at an Excel London Meetup:
Understanding the Problem with Static Row Removal
When importing data from text files, CSV files, or other sources, you often encounter extraneous information at the top and bottom of your dataset. These might include title rows, explanatory notes, summary statistics, or footer information that aren’t part of the actual data you need to analyze. Using Power Query’s standard “Remove Top Rows” or “Remove Bottom Rows” functions with a fixed number works fine when your source data structure never changes, but becomes problematic when the number of these rows varies.
In the example shown in my video, we have multiple rows of metadata before the actual header row (which contains “Date” as the first column value), and several rows of additional information at the bottom of the data that need to be removed. There are also some missing values throughout the data that make simple filtering ineffective.
Dynamically Removing Top Rows Based on Conditions
The conventional approach to removing top rows in Power Query involves specifying a fixed number. However, this can be problematic when the number of header rows changes. The dynamic solution involves using a condition rather than a fixed count.
Here’s how to implement this technique:
Go to the Home tab in Power Query Editor
Select “Remove Rows” and then “Remove Top Rows”
Instead of entering a number, modify the formula to use the “each” keyword followed by a condition
The formula will look something like this: each [Column1] <> “Date” (assuming “Date” is the header text in your first column). This tells Power Query to keep removing rows until it finds a row where the first column contains the text “Date”.
The Technical Details of the Table.Skip Function
Behind the scenes, Power Query uses the Table.Skip function when you remove top rows. This function has a hidden capability that isn’t obvious from the user interface — it can accept either a count or a condition parameter.
When using a condition, the syntax changes from simply providing a number to using the format:
each [condition]
The condition is evaluated for each row, starting from the top, and rows are removed until the condition is no longer true. This allows for dynamic adaptation to varying source data structures.
Dynamically Removing Bottom Rows
Similarly, we can apply the same concept to remove rows from the bottom of our dataset. This is particularly useful when dealing with files that contain summary information, notes, or other footer data that should be excluded from analysis.
The process for removing bottom rows dynamically is:
Go to the Home tab in Power Query Editor
Select “Remove Rows” and then “Remove Bottom Rows”
Replace the static number with a condition using the “each” keyword
For example, you might use a formula like: each [Merchant] = “” to remove rows from the bottom where the Merchant column contains an empty text string. Or you might use each [Revenue] = null to remove rows where the Revenue column contains null values.
Handling Different Types of Empty Values
When working with bottom rows, it’s important to understand the different types of empty values that might appear in your data:
Empty text strings — represented by “” in formulas
Null values — represented by null in formulas
Missing values — which might be null or empty depending on the data source
In the video demonstration, I show how to handle both empty text strings and null values as conditions for removing bottom rows. The key is to identify which column and which type of empty value reliably indicates the footer section of your data.
Practical Applications and Benefits
This dynamic row removal technique is particularly valuable in several scenarios:
When working with regularly updated reports where the structure might change slightly between versions, this approach ensures your Power Query solution remains robust. It’s also helpful when dealing with data exports from systems that include varying amounts of metadata or when consolidating multiple files that might have different header structures.
The major benefits include:
More resilient data transformation processes that don’t break when source formats change slightly
Reduced need for manual intervention when processing new data
Ability to handle files with inconsistent structure automatically
Greater flexibility compared to static row removal or simple filtering
This technique demonstrates the power of M language in Power Query, allowing for solutions that go beyond what’s immediately available in the user interface. By understanding and leveraging these more advanced capabilities, you can create more robust data transformation processes.
Important Considerations
When implementing this technique, keep in mind a few important points:
The condition you use must reliably identify the boundary between the rows you want to keep and those you want to remove. Choose column values that are consistently present (or consistently absent) at these boundaries. Also be aware that if your condition never evaluates to false, you could potentially remove all rows from your dataset, so testing with representative sample data is essential.
Additionally, remember that this technique works even when you have missing values in your actual data. As shown in the video, the rows are only removed when they match the specific condition you’ve defined, allowing rows with some missing values to be retained as long as they don’t match your removal condition.
If I need to extract only letters from my text, what should I do? Lets find out.
Extract only letters from text
When I need to extract only letters from my text, it means that I also need to extract spaces and enters (new lines).
With Power Query this task is quite easy.
I need to select one cell in my table, then go to the Data tab, and click on the From Table/Range command in order to take our data to Power Query (Fig. 1)
Fig. 1 From Table/Range command
Here, I have four simple pieces of text. One in English and three in Polish. We need to go to the Add column tab, then to the Custom column command (Fig. 2)
Fig. 2 Custom column command
In the Custom Column window, we have the space to name our column. Let’s call it Clean, and below we have the Custom column formula box, where we write the Text.Select function. In the function we need to work with text, so let’s select the right column (1). Then, I need letters. Since there will be string of characters, I have to use curly brackets, where we write upper case letter in the first sequence, which is “A”..“Z”. Then let’s write lower case letters, which is “a”..“z”. Writing two dots between letters means that there is a sequence. Let’s close the curly bracket and the parenthesis. Now, it should be fine (Fig 3)
Fig. 3 Whole function
After pressing OK, the system extracted for us only letters. However, the text presented like this is just gibberish. That’s why we have to add spaces. To do that, we need to go to the View bar, then select the Formula bar checkbox and start modifying our formula. Let’s write a comma and a space in double quotes (1). If we need other signs, we should also write them, e.g. a dot (2). As we can see, we have spaces and dots in our new text (Fig. 4)
Fig. 4 Modifying the formula
However, I don’t need dots in my text. What I really need are enters. We can remove the dot from the formula and write #(lf) in its place. After entering the formula, we can see that we have extracted enters, lower case letters, spaces. But, I didn’t extract Polish letters. In row 4 there are strictly Polish letters. In rows 2 and 3 there are some of them. The question is: how can I extract unusual signs from text using the Text.Select function. First of all, we have to check the Unicode. We can see that letters from A to Z is a sequence of numbers from 65 to 90. It’s similar in lower case letters. The most important lesson from it is that we have a sequence (Fig. 5)
Fig. 5 Unicode
Now, let’s go back to our formula. If I want to have Polish letters, I can write a sequence of “Ó”..“ż”. After entering the formula, we have extracted also Polish signs (Fig. 6)
Fig. 6 Extracting Polish signs
Now, that we have extracted all signs we need, we can go to the Home tab and click on the Close&Load to command to transfer our results to Excel (Fig. 7)
Fig. 7 Loading results to Excel
In the Import Data window, we have to select a cell (Fig. 7)
Fig. 7 Choosing the target cell
As we can see, the text is written in one line, so we need to wrap it. Let’s go to the Home tab and click on the Wrap Text command (Fig. 8)
Fig. 8 Wrap Text command
Now, we clearly see upper case letter, lower case letter, spaces, enters. I removed signs that I didn’t need (Fig. 9)
Today, we want to combine all values from the same group into one cell. Let’s go.
Combine all values within group with delimiter | Excel Power Query 03
In our example, we want to combine all publishers. It means that we’re doing an operation reverse to the one in the previous post. This time, we want to go from this table (Fig. 1):
Fig. 1 Source table
to this table (Fig. 2):
Fig. 2 Target table
We can do it with Power Query. First, select the table. Then, go to the Data tab and select the From Table/Range command (Fig. 3).
Fig. 3 Going to Power Query
We can see that we have our two columns. The first thing we want to do is group our publishers. Let’s select the Publisher header, then go to the Home tab and then Group By command. In the new window, we can see that we are grouping by the Publisher column. We still need a new aggregation. Let’s change the Operation bar into All Rows and write ‘Series’ in the New column name bar. We cannot change anything in the Column bar. Let’s press OK (Fig. 4).
Fig. 4 Group By window
Now, we have the Publishers group, and in the cells from the Series column, we have all rows from the previous stage. The whole table is in one, single cell (Fig. 5).
Fig. 5 Whole table in one cell
However, we don’t need the whole table, which consists of two columns. We only need the Series column. We can change it by modifying the code. Let’s go the View tab, and let’s check whether the Formula bar is ticked. If it is so, we can start modifying the code in the formula bar (Fig. 6).
Fig. 6 Going to the Formula bar
Looking at the code, we can see that we don’t want to choose underscore for each step. It means that we need only one column. That’s why we must write ‘Series’ in the code (1). Now, let’s delete the type from the code (2) and write ‘type as list’. It’s important to write ‘list’ with a lowercase (Fig. 7). And we see that we have only lists when we click on any of the Series cells.
Fig. 7 Code modification
Now, as Power Query treats values as a list, we can extract them by using the Extract Values option (Fig. 8).
Fig. 8 Extracting values
We are extracting the values by combining them with a delimiter. We want to use a comma and a space, so let’s select the Custom option first and then write what we want (Fig. 9). Let’s close it with OK.
Fig. 9 Delimiter
Now, we can see that we combined all Series names into one cell for each publisher. We can now go to the Home tab, then Close & Load command, and press the Close & Load to option (Fig. 10).
Fig. 10 Loading data to Excel
Let’s select the Table and the Existing Worksheet radio buttons in the Import Data window (Fig. 11)
Fig. 11 Data importing
We can see that all values within one group are combined into one cell (Fig. 12)
Let’s assume that we are in the future and we sell spaceships. We want to find out what class of a spaceship each country has bought basing on our table of unique spaceship classes.
Combine all unique values within group | Excel Power Query 04
With Power Query, this task is very simple. We have to select our main table, then go to the Data tab and choose the From Table/Range command in order to export our data from Excel to Power Query (Fig. 1).
Fig. 1 Exporting data to Power Query
Since we are interested which country has bought which spaceship, we want to group our data by the country. We have to select the Country column and go to the Home tab (1) and choose the Group By command (2). In the window, we need to choose the Advanced option (3) because we want to add more aggregation. I want to count how many ships each country has bought as well as know the spaceship class names. Since there isn’t any proper operation, let’s choose the Sum option (4), which we will modify later in the code. In the column bar, we’re choosing the Ship class (5). As a final step, let’s write a new name, which is going to be ‘Unique ship classes’ (6) (Fig. 2)
Fig. 2 Data grouping
In the new table, there are some errors in the last column, because we wanted to sum up text, which is impossible. It means that we need to modify our code. We need to go to the View (1) tab and tick the Formula bar checkbox (2). We can see that the formula uses the List.Sum function (3) (Fig. 3)
Fig. 3 Formula bar
What we want to do is combine all values, so let’s change the List.Sum into the Text.Combine function (1). We also need to add a separator, which will be a comma and a space (2) (Fig. 4)
Fig. 4 Code modification
Now, we can see that Power Query has changed its calculations. All ship classes are combined. However, as I said before, we need only unique values without repeats. We just need to add one function, which is the List.Distinct function. (Fig. 5)
Fig. 5 List.Distinct function
Now, we can see there are no repeats. We can go to the Home tab (1) and to the Close & Load command (2) and the Close & Load to option (3) (Fig. 6)
Fig. 6 Going back to Excel
In Excel, we have to remember to select the Table (1) and Existing worksheet (2) radio buttons. Then choose a target cell (Fig. 7)
Fig. 7 Data importing
As we can see, we have counted how many ships each country has bought as well as the product class (Fig. 8)
Today, we want to find all file names from a folder.
List of file names from a folder | Excel Power Query 05
Let’s see our folder. We have different kinds of files and even a subfolder. There is also an empty folder called Empty, which must stay empty. Our task is to find all files connected to Excel, which means that we will need to do some filtering (Fig. 1)
Fig. 1 Folder content
We know the path, so we can copy it, then go to Excel, and then use Power Query, which is the simplest solution I know. We need to go the Data tab (1), then to the Get Data command (2), then to the From File (3) and From Folder (4) options (Fig. 2)
Fig. 2 From Folder option
In the new window, we can find our folder, for example, by the navigation bar, where we can paste the whole path and press Open (Fig. 3)
Fig. 3 Navigation bar
We can see that Power Query has prepared a whole batch of information from the given folder, as well as from the Subfolder for us. We don’t see the Empty folder, because there aren’t any files. In our situation, it’s fine for us. Then, we need to click on the Transform Data button (Fig. 4)
Fig. 4 Information prepared by Power Query
The Transform Data option gives us the chance to add some filtering to get the data we are interested in. In the window that appeared, we can see that we can filter the information by the Folder Path (1). If we are interested only in the main folder, we can check only the main folder (2) (Fig. 5)
Fig. 5 Filtering by the Folder Path
However, we want all folders, together with the Subfolder, so let’s close this filtering and go to the Extension filtering. Since we are interested in Excel files, we are looking for the ones with an .xls extension. Remember that Power Query is case sensitive, so if we have some suspicions that there may be some upper case letters, we have to transform our Extension column. We need to go to the Transform tab, then to the format command and the lowercase option (Fig. 6)
Fig. 6 Lowercase option
Now, we can filter this column by Text Filters and the Begins with option (Fig. 7)
Fig. 7 Begins with option
We want the text to begin with .xls, which should get all extensions connected with Excel (Fig. 8)
Fig. 8 .xls extension
We can see that we have our chosen extensions as well as the file names also with extensions. This is the information that we needed, so let’s right click on the Name bar, and select the Remove Other Columns option (Fig. 9)
Fig. 9 Remove Other Column option
We are left only with one column, where we have file names with extensions. Sometimes, we don’t need any extensions. In those cases, we can add one more step to Power Query. We have to select the column, then go to the Transform tab, then to the Extract command and the Text Before Delimiter option (Fig. 10)
Fig. 10 Text Before Delimiter option
Our delimiter is just a dot, so let’s write it (Fig. 11)
Fig. 11 Delimiter
Now, we have only file names without extensions. However, let’s say that I still want those extensions. In such a case I need to delete this step (Fig. 12)
Fig. 12 Deleting the last filtering
Now, I go to the Home tab, click on the Close & Load command, then the Close & Load to option (Fig. 13)
Fig. 13 Going back to Excel
Let’s choose the target cell and select the Table radio button (Fig. 14)