Dynamically removing top and bottom rows in Power Query can significantly streamline your data cleaning process when dealing with inconsistent data imports. This technique allows you to automatically eliminate unnecessary header rows or footer information based on specific conditions rather than fixed row counts, making your data transformation process more robust and adaptable to changing source files.
The ability to use conditions rather than static numbers is particularly valuable when working with regularly updated reports that may contain varying amounts of metadata or summary information.
In my video, I demonstrate how to implement this dynamic row removal technique that I learned at an Excel London Meetup:
Understanding the Problem with Static Row Removal
When importing data from text files, CSV files, or other sources, you often encounter extraneous information at the top and bottom of your dataset. These might include title rows, explanatory notes, summary statistics, or footer information that aren’t part of the actual data you need to analyze. Using Power Query’s standard “Remove Top Rows” or “Remove Bottom Rows” functions with a fixed number works fine when your source data structure never changes, but becomes problematic when the number of these rows varies.
In the example shown in my video, we have multiple rows of metadata before the actual header row (which contains “Date” as the first column value), and several rows of additional information at the bottom of the data that need to be removed. There are also some missing values throughout the data that make simple filtering ineffective.
Dynamically Removing Top Rows Based on Conditions
The conventional approach to removing top rows in Power Query involves specifying a fixed number. However, this can be problematic when the number of header rows changes. The dynamic solution involves using a condition rather than a fixed count.
Here’s how to implement this technique:
Go to the Home tab in Power Query Editor
Select “Remove Rows” and then “Remove Top Rows”
Instead of entering a number, modify the formula to use the “each” keyword followed by a condition
The formula will look something like this: each [Column1] <> “Date” (assuming “Date” is the header text in your first column). This tells Power Query to keep removing rows until it finds a row where the first column contains the text “Date”.
The Technical Details of the Table.Skip Function
Behind the scenes, Power Query uses the Table.Skip function when you remove top rows. This function has a hidden capability that isn’t obvious from the user interface — it can accept either a count or a condition parameter.
When using a condition, the syntax changes from simply providing a number to using the format:
each [condition]
The condition is evaluated for each row, starting from the top, and rows are removed until the condition is no longer true. This allows for dynamic adaptation to varying source data structures.
Dynamically Removing Bottom Rows
Similarly, we can apply the same concept to remove rows from the bottom of our dataset. This is particularly useful when dealing with files that contain summary information, notes, or other footer data that should be excluded from analysis.
The process for removing bottom rows dynamically is:
Go to the Home tab in Power Query Editor
Select “Remove Rows” and then “Remove Bottom Rows”
Replace the static number with a condition using the “each” keyword
For example, you might use a formula like: each [Merchant] = “” to remove rows from the bottom where the Merchant column contains an empty text string. Or you might use each [Revenue] = null to remove rows where the Revenue column contains null values.
Handling Different Types of Empty Values
When working with bottom rows, it’s important to understand the different types of empty values that might appear in your data:
Empty text strings — represented by “” in formulas
Null values — represented by null in formulas
Missing values — which might be null or empty depending on the data source
In the video demonstration, I show how to handle both empty text strings and null values as conditions for removing bottom rows. The key is to identify which column and which type of empty value reliably indicates the footer section of your data.
Practical Applications and Benefits
This dynamic row removal technique is particularly valuable in several scenarios:
When working with regularly updated reports where the structure might change slightly between versions, this approach ensures your Power Query solution remains robust. It’s also helpful when dealing with data exports from systems that include varying amounts of metadata or when consolidating multiple files that might have different header structures.
The major benefits include:
More resilient data transformation processes that don’t break when source formats change slightly
Reduced need for manual intervention when processing new data
Ability to handle files with inconsistent structure automatically
Greater flexibility compared to static row removal or simple filtering
This technique demonstrates the power of M language in Power Query, allowing for solutions that go beyond what’s immediately available in the user interface. By understanding and leveraging these more advanced capabilities, you can create more robust data transformation processes.
Important Considerations
When implementing this technique, keep in mind a few important points:
The condition you use must reliably identify the boundary between the rows you want to keep and those you want to remove. Choose column values that are consistently present (or consistently absent) at these boundaries. Also be aware that if your condition never evaluates to false, you could potentially remove all rows from your dataset, so testing with representative sample data is essential.
Additionally, remember that this technique works even when you have missing values in your actual data. As shown in the video, the rows are only removed when they match the specific condition you’ve defined, allowing rows with some missing values to be retained as long as they don’t match your removal condition.
Splitting text by length in Power Query can transform cluttered data into organized, usable information without complex formulas. This technique allows you to break down text strings into separate columns based on specific character positions, making it especially useful when dealing with fixed-width data formats that contain multiple pieces of information.
The real power of this approach lies in its ability to handle irregular splitting requirements where each section has a different length.
In my video, I demonstrate the complete process of splitting text by length in Power Query:
Understanding the Data Structure
When working with text data that needs to be split, it’s essential to first analyze the structure of your text. In the demonstration, I work with a dataset where each text string contains several pieces of information with varying lengths:
Person information (30 characters)
Delimiter characters (semicolons, pipes) that need to be removed
Date information (appearing as numbers)
Currency values in different formats
The challenge lies in the fact that each section has a different length, making standard split functions less effective. This is precisely where Power Query’s split by position feature becomes invaluable.
Importing Data to Power Query
The first step in the process is to import your data into Power Query. This can be done easily by selecting your data table and using the From Table/Range option in the Data tab. Once your data is in Power Query, you’ll have access to powerful transformation tools that aren’t available in standard Excel.
Power Query provides a user-friendly interface where you can see your data and apply various transformations step by step. This visual approach makes it easier to track changes and ensure that your data is being processed correctly.
Splitting Text by Position
With the data imported into Power Query, we can now split the text column based on specific positions. Here’s how to do it:
Go to the Home tab in Power Query
Select Split Column and choose “By Positions” (not “By Number of Characters”)
Enter the specific positions where you want to split the text
In the example, I needed to split at positions 0, 30, 31, 39, and 41. It’s important to note that Power Query counts from zero for the first position, not one. These numbers represent the starting points for each section of text.
After pressing OK, Power Query creates new columns based on these position splits. The result is five separate columns, each containing a distinct part of the original text string.
Refining the Split Data
Renaming and Removing Columns
After splitting the text, we need to organize our data by giving meaningful names to the important columns and removing unnecessary ones. In the formula bar, I renamed the columns to reflect their content:
“Text.1” became “Person”
“Text.3” became “Date”
“Text.5” became “Payout”
For columns containing delimiters or other unwanted information (in this case, “Text.2” and “Text.4”), we can simply delete them by selecting them with Ctrl+click and pressing the Delete key.
Correcting Data Types
Once we have our properly named columns, we need to ensure that each column has the correct data type. In the demonstration:
The “Date” column was initially recognized as an integer and needed to be converted to the date data type
The “Payout” column contained currency values in different formats that needed proper interpretation
Power Query can automatically detect and convert data types, but sometimes manual intervention is necessary. By clicking on the data type icon in the column header, you can force Power Query to interpret the data as a specific type.
Working with Regional Settings
An important aspect of working with dates and currency values is understanding how regional settings affect data interpretation. In Power Query, you can adjust these settings to match the format of your data.
To access these settings, go to:
Options and Settings
Query Options
Current Workbook
Regional Settings
In my demonstration, the regional settings were set to “English (United Kingdom)” which allowed Power Query to correctly interpret the pound (£) currency symbols regardless of their position in the text. If your data uses different regional formats, you can adjust these settings to match your needs.
For individual columns, you can also click on the data type icon and select “Using Locale” to specify both the data type and regional format for that particular column. This gives you fine-grained control over how Power Query interprets your data.
Finalizing the Transformation
After completing all the necessary transformations — splitting the text, renaming columns, removing unnecessary columns, and setting the correct data types — the final step is to load the transformed data back into Excel.
From the Home tab, select “Close & Load” to export your properly organized table back to Excel. The result is a clean, structured dataset with separate columns for person information, dates, and payment values, all with the appropriate data types.
This technique of splitting text by position in Power Query is particularly useful when dealing with fixed-width data exports from legacy systems, standardized report outputs, or any situation where text strings contain multiple data points at known positions. By mastering this approach, you can quickly transform dense, combined text fields into organized and usable data.
Importing fixed-width column data to Excel can be challenging when each column has a set number of characters, often resulting in unnecessary spaces. In this article, I’ll show you how to properly import fixed-width CSV files using Power Query, ensuring clean data without those problematic extra spaces.
This technique is essential for handling structured data exports from legacy systems or standardized reports.
Understanding Fixed-Width Column Data
Fixed-width column data is a specific format where each column occupies exactly the same number of characters in every row. Unlike delimited formats (like comma-separated values), fixed-width files allocate a predetermined number of spaces for each field. This means shorter values get padded with spaces to maintain the fixed width. For example, the date column might always be 10 characters wide, the merchant column 20 characters, and so on.
This format is common in older systems, government data, and standardized reports, but it presents unique challenges when importing to Excel. The main issue is that these extra spaces can cause problems with data analysis and visualization if not properly handled during the import process.
Importing Fixed-Width Data Using Power Query
Power Query provides excellent tools for handling fixed-width column data. Here’s the process to properly import this type of data:
Initial Import Steps
To begin importing fixed-width data, follow these steps:
Go to the Data tab in Excel
Click on “From Text/CSV” option
Locate and select your fixed-width file
When the Power Query Editor opens, you might need to adjust the window size to see all available options, as sometimes the third column of settings may be hidden if the window is too narrow. Make sure to expand the window to see all import settings.
Configuring Import Settings
During the import process, you’ll need to configure several important settings:
First, check the file origin (encoding) — UTF‑8 works well for most modern files. Then, verify that the delimiter is set correctly. For fixed-width files, Power Query typically automatically detects this format. You should see “Fixed Width” selected rather than comma, tab, or other delimiters.
The data type detection setting is another crucial choice. You have three options:
Based on first 200 rows
Based on entire dataset
Do not detect data types
In most cases with fixed-width data, it’s better to select “Do not detect data types” initially and handle data type conversion after cleaning the spaces. This prevents Power Query from making assumptions about your data types that might be affected by the extra spaces.
Handling Extra Spaces in Power Query
The key challenge with fixed-width data is managing the extra spaces that pad each column. When examining the data in Power Query, you’ll notice these extra spaces in both the column headers and text fields. These spaces are necessary in the original file to maintain the fixed-width format, but they’re problematic for analysis in Excel.
For example, when clicking on a column header, you might see something like “Date ” with several spaces after the actual name. Similarly, text fields like product names or merchant names will have trailing spaces to fill their allocated width.
Removing Extra Spaces
To clean up the data properly:
Select all columns in the Power Query Editor by clicking the first column header and then Shift-clicking the last column
Go to the Transform tab
Click on “Format” and select “Trim”
This important step removes all leading and trailing white spaces from every cell in your dataset while preserving spaces that are actually part of your data (like spaces between words in product names). After trimming the data, you’ll notice a significant improvement in how your data looks.
Setting Headers and Data Types
After removing the extra spaces, you can proceed with configuring headers and data types:
Use the “Use First Row as Headers” option (found in the Transform tab or by right-clicking on the first row) to set your headers properly. Now that the spaces have been removed, your column names will appear clean and professional.
With clean data, you can now set appropriate data types for each column. For example, date columns should be formatted as dates, and revenue or cost columns as currency or decimal numbers. Power Query will help identify appropriate types, but you can manually change them by clicking the data type icon in each column header.
Loading Data to Excel
The final step is to load your cleaned data into Excel:
Click “Close & Load” (or “Close & Load To” for more options)
Select whether to load as a table or to a specific location
Choose an existing worksheet or create a new one
When loading the data, you have the option to add it to the data model if you plan to use it with Power Pivot, but this isn’t necessary for basic Excel analysis. One important thing to note is that while Power Query may detect currency data types, this doesn’t automatically apply Excel’s currency formatting to your table. You’ll need to apply appropriate number formatting in Excel after importing.
Fortunately, when you refresh your data in the future, Excel will maintain any number formatting you’ve applied to the table, making future updates seamless.
Benefits of Using Power Query for Fixed-Width Data
Using Power Query for importing fixed-width data offers several advantages over other methods:
The ability to transform data before loading it to Excel saves significant time compared to manual cleanup after import. The process is also repeatable — if you receive updated versions of the same file, you can simply refresh your query rather than repeating the entire import and cleanup process.
Additionally, Power Query maintains a record of all transformations applied, making your data preparation process transparent and easily adjustable if needed. This approach ensures consistency in how your fixed-width data is processed each time.
Merging tables in Power Query with summarized data provides a powerful way to consolidate information without overwhelming detail. By aggregating invoice details such as counts, sums, averages, and medians, you can create concise reports that highlight key insights while maintaining the original table structure.
This technique is particularly valuable when dealing with sales data where each invoice contains multiple line items that need to be condensed.
In my video tutorial, I demonstrate how to effectively merge tables with summarization in Power Query:
Understanding the Problem: Merging Tables with Summarization
When working with relational data like invoices and their detailed items, we often need to combine information from multiple tables. However, simply merging tables can result in redundant data and unnecessary complexity. The challenge is to merge data while simultaneously summarizing the details to provide meaningful insights.
In this demonstration, I work with two tables: one containing basic invoice information and another with detailed line items for each invoice. The goal is to enhance the invoice table with summarized information from the details table, such as the count of items, average quantities, sum of prices, and even statistical measures like medians.
Loading Tables into Power Query
The first step in our process is to load both tables into Power Query for manipulation. This gives us access to the powerful data transformation capabilities that allow for sophisticated merging and summarization.
To begin, I navigate to the Data tab in Excel and select “From Table/Range” to import our first table into Power Query. For this initial details table, I choose to load it only as a connection rather than importing it directly into the worksheet. This is done by selecting “Close and Load To” and then choosing the “Connection only” option.
After loading the details table, I follow the same process to load the invoice table. This time, however, we’ll perform our merge operation directly within this query rather than creating a new one.
Merging the Tables
With both tables loaded into Power Query, we can now merge them using the common invoice ID field. This is where the power of our approach becomes evident.
To perform the merge:
Select the “Merge Queries” command from the ribbon
Choose the second table (details) to merge with
Select matching columns for merging (invoice ID in both tables)
Choose “Left Outer Join” as the join kind to ensure all invoices are included
After pressing OK, Power Query creates a new column in our invoice table that contains the matching rows from the details table. You’ll notice that for each invoice, we now have a nested table showing all the corresponding detail rows. For example, some invoices might have two detail rows, others three, and some even five or more.
The Key Difference: Aggregating Instead of Expanding
The critical step that differentiates this approach from a standard merge is what we do next. Instead of expanding these nested tables (which would create duplicate invoice rows), we want to aggregate the information they contain.
When we click on the expand button (the small triangle) in the column header, we get options to expand or aggregate. By choosing to aggregate, we can perform calculations across all the detail rows for each invoice.
Creating Summarized Metrics
The power of this technique lies in the variety of aggregations we can perform on the detailed data. For our invoice example, I demonstrate several useful metrics:
Count of items: Simple count of how many line items exist for each invoice
Average quantity: Instead of summing quantities, we calculate the average number of items per line
Sum of unit price: Total value of the invoice
Median of unit price: Statistical measure showing the middle value of all unit prices on the invoice
To create these aggregations, I select the appropriate options from the dropdown menu when expanding the nested table column. By unchecking “Use original column name as prefix,” we get cleaner column names in our results.
Refining the Results
After creating our aggregated columns, we may want to refine the column names for clarity. This can be done directly in the formula bar in Power Query. For example, I rename one of the columns to “Count” to better reflect its purpose.
It’s also important to check the data types of our new columns. Power Query automatically assigns data types, but they may need adjustment. In our case, the “Count” column is correctly set as a decimal number, which works well since counts will always be whole numbers.
When working with monetary values like the sum of unit prices, Power Query may preserve the currency data type. However, it’s worth noting that this data type information doesn’t automatically translate to formatting in Excel. We’ll need to apply currency formatting in Excel after loading the data.
Loading the Final Results
Once we’re satisfied with our merged and summarized table, we can load it back to Excel. From the Home tab in Power Query, I select “Close and Load To” and choose to create a new table in the existing worksheet.
The final result is an enhanced invoice table that includes aggregated information from the details table without duplicating invoice records. This gives us a concise view of our data with valuable insights such as the number of items per invoice, average quantities, total values, and statistical measures.
This technique is particularly valuable for financial analysis, sales reporting, and any situation where you need to combine master records with summarized information from related detail records. By leveraging Power Query’s merging and aggregation capabilities, we can create more informative reports while maintaining a clean, non-redundant data structure.
Sales data analysis often requires summarizing results by specific categories to gain valuable business insights. In this article, I demonstrate how to efficiently calculate sales summaries using Power Query’s grouping functionality, providing a powerful alternative to traditional Excel formulas for analyzing merchant performance, income totals, and product distribution.
The grouping technique shown allows for multiple aggregations in a single operation, dramatically improving efficiency when working with larger datasets.
Watch my detailed tutorial video below to follow along with each step:
Understanding the Sales Summary Challenge
When analyzing sales data, we often need to create summaries that show performance metrics grouped by specific categories. In this demonstration, I’m working with a dataset containing sales transactions that include merchants, products, dates, and income values. My goal is to summarize this data to show several key metrics for each merchant:
Count of sales transactions
Sum of income
Average income per transaction
List of unique products sold
While these calculations can be done using Excel formulas (which I’ve prepared in the worksheet for comparison), using Power Query offers significant advantages, especially for larger datasets or repetitive analysis tasks.
Importing Data into Power Query
The first step in our process is to bring the data into Power Query where we can perform the grouping operations:
Select the data table in Excel
Navigate to the Data tab in the Excel ribbon
Click “From Table/Range” to import the data into Power Query
Once the data is loaded into the Power Query Editor, we can see all our sales records and begin the grouping process to create our summary.
Creating Groups in Power Query
To summarize our data by merchant, we need to use the Group By function in Power Query:
Select the Merchant column in the Power Query Editor
Go to the Home tab and click “Group By”
The default Group By dialog offers basic functionality with just one column for grouping and a single aggregation. For our needs, we require the Advanced option since we want to create multiple aggregations based on the merchant column.
Setting Up Advanced Grouping
In the advanced Group By dialog, we need to configure multiple aggregations:
Set the grouping column to Merchant
Add the following aggregations:
“Count Rows” for the number of sales (named “Count”)
“Sum” of the Income column (named “Sum”)
“Average” of the Income column (named “Average”)
“All Rows” to capture all related data (named “List”)
After clicking OK, Power Query creates a new table with our merchant groups and the specified aggregations. The Count, Sum, and Average values match the Excel formulas we prepared earlier, confirming that our grouping is working correctly.
Customizing the Product List
The “All Rows” aggregation creates a nested table for each merchant containing all their sales records. However, we only want a list of unique products sold by each merchant, not all the row data. Since the Group By dialog doesn’t offer this specific option, we need to modify the M formula directly.
Modifying the M Formula
To customize our product list, we need to edit the M code that Power Query generated:
The default “All Rows” aggregation produces tables with all columns for each merchant
We need to change this to extract only the Product column values
Furthermore, we need to eliminate duplicates from these lists
The original formula includes {_} which references the entire row. We need to replace this with a reference to just the Product column: [Product]. Additionally, we need to wrap this in the List.Distinct function to remove duplicates:
The modified formula section should look like: List.Distinct([Product])
Transforming the Product List into Text
Now that we have lists of unique products for each merchant, we can transform these lists into comma-separated text values for better readability:
Click on the expand button in the List column header
Select “Extract Values…” rather than “Expand to New Rows”
Choose a custom delimiter (comma followed by space: “, ”)
Click OK
This transforms our product lists into readable text strings showing all the unique products each merchant has sold, separated by commas.
Loading the Results Back to Excel
Once we’ve created our summary with all the required metrics, we can load the results back to Excel:
Go to the Home tab and click “Close & Load To…”
Select “Existing Worksheet” and choose where to place the results
Click OK
After loading, we can format the Sum and Average columns as currency to improve readability. The final result is a clean summary table showing the count of sales, total income, average income, and unique products for each merchant.
Benefits of Using Power Query for Grouping
While our example used a relatively small dataset, the advantages of using Power Query for this type of analysis become even more apparent with larger data volumes:
Efficiency — Performs calculations much faster than Excel formulas for large datasets
Repeatability — The query can be refreshed when data changes without redoing formulas
Flexibility — Allows for complex aggregations and transformations that would be difficult with Excel formulas
Data cleaning — Integrates with other Power Query transformations for comprehensive data preparation
By mastering this grouping technique in Power Query, you can create sophisticated sales summaries and other data aggregations with just a few clicks, saving significant time and effort compared to traditional Excel methods.