Dynamically removing top and bottom rows in Power Query can significantly streamline your data cleaning process when dealing with inconsistent data imports. This technique allows you to automatically eliminate unnecessary header rows or footer information based on specific conditions rather than fixed row counts, making your data transformation process more robust and adaptable to changing source files.
The ability to use conditions rather than static numbers is particularly valuable when working with regularly updated reports that may contain varying amounts of metadata or summary information.
In my video, I demonstrate how to implement this dynamic row removal technique that I learned at an Excel London Meetup:
Understanding the Problem with Static Row Removal
When importing data from text files, CSV files, or other sources, you often encounter extraneous information at the top and bottom of your dataset. These might include title rows, explanatory notes, summary statistics, or footer information that aren’t part of the actual data you need to analyze. Using Power Query’s standard “Remove Top Rows” or “Remove Bottom Rows” functions with a fixed number works fine when your source data structure never changes, but becomes problematic when the number of these rows varies.
In the example shown in my video, we have multiple rows of metadata before the actual header row (which contains “Date” as the first column value), and several rows of additional information at the bottom of the data that need to be removed. There are also some missing values throughout the data that make simple filtering ineffective.
Dynamically Removing Top Rows Based on Conditions
The conventional approach to removing top rows in Power Query involves specifying a fixed number. However, this can be problematic when the number of header rows changes. The dynamic solution involves using a condition rather than a fixed count.
Here’s how to implement this technique:
- Go to the Home tab in Power Query Editor
- Select “Remove Rows” and then “Remove Top Rows”
- Instead of entering a number, modify the formula to use the “each” keyword followed by a condition
The formula will look something like this: each [Column1] <> “Date” (assuming “Date” is the header text in your first column). This tells Power Query to keep removing rows until it finds a row where the first column contains the text “Date”.
The Technical Details of the Table.Skip Function
Behind the scenes, Power Query uses the Table.Skip function when you remove top rows. This function has a hidden capability that isn’t obvious from the user interface — it can accept either a count or a condition parameter.
When using a condition, the syntax changes from simply providing a number to using the format:
each [condition]
The condition is evaluated for each row, starting from the top, and rows are removed until the condition is no longer true. This allows for dynamic adaptation to varying source data structures.
Dynamically Removing Bottom Rows
Similarly, we can apply the same concept to remove rows from the bottom of our dataset. This is particularly useful when dealing with files that contain summary information, notes, or other footer data that should be excluded from analysis.
The process for removing bottom rows dynamically is:
- Go to the Home tab in Power Query Editor
- Select “Remove Rows” and then “Remove Bottom Rows”
- Replace the static number with a condition using the “each” keyword
For example, you might use a formula like: each [Merchant] = “” to remove rows from the bottom where the Merchant column contains an empty text string. Or you might use each [Revenue] = null to remove rows where the Revenue column contains null values.
Handling Different Types of Empty Values
When working with bottom rows, it’s important to understand the different types of empty values that might appear in your data:
- Empty text strings — represented by “” in formulas
- Null values — represented by null in formulas
- Missing values — which might be null or empty depending on the data source
In the video demonstration, I show how to handle both empty text strings and null values as conditions for removing bottom rows. The key is to identify which column and which type of empty value reliably indicates the footer section of your data.
Practical Applications and Benefits
This dynamic row removal technique is particularly valuable in several scenarios:
When working with regularly updated reports where the structure might change slightly between versions, this approach ensures your Power Query solution remains robust. It’s also helpful when dealing with data exports from systems that include varying amounts of metadata or when consolidating multiple files that might have different header structures.
The major benefits include:
- More resilient data transformation processes that don’t break when source formats change slightly
- Reduced need for manual intervention when processing new data
- Ability to handle files with inconsistent structure automatically
- Greater flexibility compared to static row removal or simple filtering
This technique demonstrates the power of M language in Power Query, allowing for solutions that go beyond what’s immediately available in the user interface. By understanding and leveraging these more advanced capabilities, you can create more robust data transformation processes.
Important Considerations
When implementing this technique, keep in mind a few important points:
The condition you use must reliably identify the boundary between the rows you want to keep and those you want to remove. Choose column values that are consistently present (or consistently absent) at these boundaries. Also be aware that if your condition never evaluates to false, you could potentially remove all rows from your dataset, so testing with representative sample data is essential.
Additionally, remember that this technique works even when you have missing values in your actual data. As shown in the video, the rows are only removed when they match the specific condition you’ve defined, allowing rows with some missing values to be retained as long as they don’t match your removal condition.