Dynamically removing top and bottom rows in Power Query can significantly streamline your data cleaning process when dealing with inconsistent data imports. This technique allows you to automatically eliminate unnecessary header rows or footer information based on specific conditions rather than fixed row counts, making your data transformation process more robust and adaptable to changing source files.
The ability to use conditions rather than static numbers is particularly valuable when working with regularly updated reports that may contain varying amounts of metadata or summary information.
In my video, I demonstrate how to implement this dynamic row removal technique that I learned at an Excel London Meetup:
Understanding the Problem with Static Row Removal
When importing data from text files, CSV files, or other sources, you often encounter extraneous information at the top and bottom of your dataset. These might include title rows, explanatory notes, summary statistics, or footer information that aren’t part of the actual data you need to analyze. Using Power Query’s standard “Remove Top Rows” or “Remove Bottom Rows” functions with a fixed number works fine when your source data structure never changes, but becomes problematic when the number of these rows varies.
In the example shown in my video, we have multiple rows of metadata before the actual header row (which contains “Date” as the first column value), and several rows of additional information at the bottom of the data that need to be removed. There are also some missing values throughout the data that make simple filtering ineffective.
Dynamically Removing Top Rows Based on Conditions
The conventional approach to removing top rows in Power Query involves specifying a fixed number. However, this can be problematic when the number of header rows changes. The dynamic solution involves using a condition rather than a fixed count.
Here’s how to implement this technique:
Go to the Home tab in Power Query Editor
Select “Remove Rows” and then “Remove Top Rows”
Instead of entering a number, modify the formula to use the “each” keyword followed by a condition
The formula will look something like this: each [Column1] <> “Date” (assuming “Date” is the header text in your first column). This tells Power Query to keep removing rows until it finds a row where the first column contains the text “Date”.
The Technical Details of the Table.Skip Function
Behind the scenes, Power Query uses the Table.Skip function when you remove top rows. This function has a hidden capability that isn’t obvious from the user interface — it can accept either a count or a condition parameter.
When using a condition, the syntax changes from simply providing a number to using the format:
each [condition]
The condition is evaluated for each row, starting from the top, and rows are removed until the condition is no longer true. This allows for dynamic adaptation to varying source data structures.
Dynamically Removing Bottom Rows
Similarly, we can apply the same concept to remove rows from the bottom of our dataset. This is particularly useful when dealing with files that contain summary information, notes, or other footer data that should be excluded from analysis.
The process for removing bottom rows dynamically is:
Go to the Home tab in Power Query Editor
Select “Remove Rows” and then “Remove Bottom Rows”
Replace the static number with a condition using the “each” keyword
For example, you might use a formula like: each [Merchant] = “” to remove rows from the bottom where the Merchant column contains an empty text string. Or you might use each [Revenue] = null to remove rows where the Revenue column contains null values.
Handling Different Types of Empty Values
When working with bottom rows, it’s important to understand the different types of empty values that might appear in your data:
Empty text strings — represented by “” in formulas
Null values — represented by null in formulas
Missing values — which might be null or empty depending on the data source
In the video demonstration, I show how to handle both empty text strings and null values as conditions for removing bottom rows. The key is to identify which column and which type of empty value reliably indicates the footer section of your data.
Practical Applications and Benefits
This dynamic row removal technique is particularly valuable in several scenarios:
When working with regularly updated reports where the structure might change slightly between versions, this approach ensures your Power Query solution remains robust. It’s also helpful when dealing with data exports from systems that include varying amounts of metadata or when consolidating multiple files that might have different header structures.
The major benefits include:
More resilient data transformation processes that don’t break when source formats change slightly
Reduced need for manual intervention when processing new data
Ability to handle files with inconsistent structure automatically
Greater flexibility compared to static row removal or simple filtering
This technique demonstrates the power of M language in Power Query, allowing for solutions that go beyond what’s immediately available in the user interface. By understanding and leveraging these more advanced capabilities, you can create more robust data transformation processes.
Important Considerations
When implementing this technique, keep in mind a few important points:
The condition you use must reliably identify the boundary between the rows you want to keep and those you want to remove. Choose column values that are consistently present (or consistently absent) at these boundaries. Also be aware that if your condition never evaluates to false, you could potentially remove all rows from your dataset, so testing with representative sample data is essential.
Additionally, remember that this technique works even when you have missing values in your actual data. As shown in the video, the rows are only removed when they match the specific condition you’ve defined, allowing rows with some missing values to be retained as long as they don’t match your removal condition.
Pivot tables offer a powerful way to analyze data, particularly when you need to understand proportions within hierarchical categories. In this tutorial, I’ll show you how to quickly add sums in a pivot table and display values as a percentage of their parent row, allowing for immediate visualization of how individual items contribute to their category totals.
This technique is especially valuable when analyzing sales data across product categories and individual items.
Watch my step-by-step video tutorial below to see this process in action:
Setting Up Your Pivot Table
To begin creating an informative pivot table with percentage calculations, we need to start with a simple dataset that contains hierarchical information. In my example, I’m using a dataset that includes categories (such as fruits, vegetables, and sweets) along with the specific products within each category and their corresponding revenue figures.
The process of creating the pivot table is straightforward:
Select your data range
Navigate to the Insert tab in the Excel ribbon
Click on “Pivot Table”
Choose to place the pivot table on an existing worksheet (I selected cell F1 in my demonstration)
Click “OK” to create the basic pivot table structure
Once your pivot table framework is established, you’ll need to structure it properly to show both categories and their constituent products. In the PivotTable Fields panel, drag the appropriate fields to build your hierarchical view.
Structuring Your Pivot Table
For proper hierarchical analysis, you’ll want to arrange your fields in a logical order. In the Rows section of the PivotTable Fields panel, add your Category field first, followed by the Product field. This creates a nested structure where products appear under their respective categories.
For the values section, we need to add the Revenue field twice — once to show the raw sum and once to show the percentage of parent. Simply drag the Revenue field to the Values area twice. By default, Excel will sum these values, which is exactly what we want for this analysis.
Changing the Layout
By default, Excel displays pivot tables in compact form, but for better readability, I prefer the tabular layout. To change this:
Go to the Design tab under PivotTable Tools
Click on “Report Layout”
Select “Show in Tabular Form”
This adjustment separates the Category and Product into distinct columns, making your data more readable and easier to analyze at a glance.
Adding Percentage of Parent Row
Now comes the key part — transforming one of our revenue columns to show percentage of parent row. This calculation will show how each product contributes proportionally to its category total, and how each category contributes to the grand total. Follow these steps:
Right-click on any cell within the second Sum of Revenue column
Select “Show Values As” from the context menu
Choose “% of Parent Row Total”
This simple change transforms the raw numbers into percentages, giving you immediate insight into the proportional contribution of each item. For instance, in my example, you can now see that apples represent approximately 35% of all fruit sales, while the fruits category as a whole represents about 41% of total sales across all categories.
Understanding the Results
After applying the percentage of parent row calculation, your pivot table automatically adjusts to show meaningful proportions at every level:
Individual products show their percentage contribution to their immediate category
Category subtotals show their percentage contribution to the grand total
The grand total always equals 100%
In my demonstration, this clearly showed that sweets accounted for approximately 40% of total sales, vegetables for about 18%, and fruits for approximately 41%. Within each category, you can similarly see the proportional contribution of each product.
Finalizing Your Pivot Table
To make your pivot table more understandable, it’s important to rename the column headers to accurately reflect what each column represents. In our case:
Change the heading of the first sum column to simply read “Sum”
Rename the second column to “Percentage of Parent”
These descriptive headers ensure that anyone viewing your pivot table will immediately understand what the numbers represent without needing additional explanation.
With these adjustments complete, you now have a comprehensive pivot table that not only shows the raw revenue figures but also clearly illustrates the proportional relationships between categories and their constituent products. This dual-view approach provides both absolute and relative perspectives on your data, enabling more nuanced analysis and decision-making.
This technique is particularly valuable when analyzing sales performance, budget allocations, or any hierarchical data where understanding proportional relationships is important. By visualizing both raw numbers and percentages simultaneously, you gain deeper insights into your data structure and can more effectively communicate those insights to others.
In this tutorial, I demonstrate a practical application of recursive functions in Power Query to calculate hierarchy levels in organizational structures or MLM systems. Using a custom function with recursion, we can efficiently determine each person’s position in a hierarchical structure based on their referrer relationships, providing valuable insights for organizational analysis and reporting.
Understanding hierarchy levels is essential for visualizing reporting structures, tracking MLM downlines, or mapping any parent-child relationships in your data.
Understanding the Problem: Hierarchy Levels in Organizations
In many organizational structures, particularly in multi-level marketing (MLM) systems or corporate hierarchies, understanding the level depth of each member is crucial. The level represents how many steps a person is from the top of the organization. For instance, in our example, John is at the top (level 0), N is directly below John (level 1), and Thomas is below N (level 2).
Our sample data contains an ID column that uniquely identifies each person and a Referrer column that indicates who brought that person into the organization. The referral relationship establishes the hierarchical structure we need to analyze. Our goal is to calculate each person’s hierarchy level automatically using Power Query’s recursive capabilities.
Setting Up Power Query
To begin working with our data, we need to import it into Power Query where we can create and apply our recursive function:
Select any cell in your data table
Go to the Data tab
Click From Table/Range to import your data into Power Query
This imports your data containing the ID and Referrer columns into the Power Query Editor, where we can start building our solution.
Creating a Recursive Function to Calculate Hierarchy Levels
The core of our solution is a custom function that can call itself (recursion) to track up through the hierarchy until it reaches the top. Here’s how to create it:
In the Power Query Editor, go to the Home tab
Click New Source > Other Sources > Blank Query
Now we need to define our function. Our function will require two parameters: the person’s ID we want to calculate the level for and the complete table of people data to reference. The function will:
Find the row for the current person
Get their referrer’s ID
Check if they have a referrer
If they don’t (they’re at the top), return 0
If they do have a referrer, call the same function for the referrer and add 1 to the result
The M code for our function looks like this:
(personID, personTable) =>
let
personRow = Table.SelectRows(personTable, each [ID] = personID){0},
personAboveID = personRow[Referrer],
result = if personAboveID = null
then 0
else @HierarchyLevel(personAboveID, personTable) + 1
in
result
Make sure to name your function HierarchyLevel so that the recursive reference to itself works properly. The @ symbol in front of the function name emphasizes that we’re calling the same function again.
Applying the Function to Our Data
After creating our function, we need to apply it to every row in our data table:
Go back to your main query with the imported data
Click Add Column > Invoke Custom Function
Select your HierarchyLevel function
For personID, select the ID column
For personTable, we need to reference the current table
When setting the personTable parameter, we initially might try to reference a column name, but this will generate errors. Instead, we need to reference the entire table from the previous step. In Power Query, we can do this by referring to the previous step name.
Once correctly configured, the function will calculate the hierarchy level for each person in our table. Thomas, who is referred by N (ID 2) who in turn is referred by John (ID 1), will show as level 2. John, who has no referrer, will be at level 0.
Testing and Validating the Recursion
To verify our function works correctly, we can examine the calculated levels for each person in our organization:
John (ID 1): Level 0 (top of hierarchy, no referrer)
Anne (ID 2): Level 1 (referred by John)
Thomas (ID 5): Level 2 (referred by Anne)
Paul: Level 3 (referred by Thomas)
We can further test by changing referrer relationships. For example, if we change Paul’s referrer from Thomas (ID 5) to someone who is already at level 3, Paul would then become level 4. After making such changes in the source data, we can simply refresh our Power Query to see the updated hierarchy levels.
Top-level members (those with null referrers) are assigned level 0
The function will work for organizations of any depth, continuing to recurse up the chain until it reaches the top
If the data contains circular references (Person A refers to Person B who refers back to Person A), the recursion could create an infinite loop — consider adding error handling for this scenario in real applications
In a real-world scenario, you might want to enhance this function to handle more complex requirements, such as detecting circular references or processing multiple hierarchies within the same dataset.
Loading the Results Back to Excel
Once you’re satisfied with your hierarchy level calculations:
In the Power Query Editor, go to Home > Close & Load
Your data table with the new hierarchy level column will appear in Excel
Any time your hierarchy changes, simply refresh the query to recalculate all levels
This powerful technique allows you to maintain an up-to-date view of your organizational structure with minimal effort. The recursive approach efficiently handles even large hierarchies without requiring complex formulas or manual tracking.
With this solution in place, you can easily build reports and visualizations that leverage hierarchy level information, enabling better insights into your organizational structure, MLM downlines, or any hierarchical data you’re working with.
Splitting text by length in Power Query can transform cluttered data into organized, usable information without complex formulas. This technique allows you to break down text strings into separate columns based on specific character positions, making it especially useful when dealing with fixed-width data formats that contain multiple pieces of information.
The real power of this approach lies in its ability to handle irregular splitting requirements where each section has a different length.
In my video, I demonstrate the complete process of splitting text by length in Power Query:
Understanding the Data Structure
When working with text data that needs to be split, it’s essential to first analyze the structure of your text. In the demonstration, I work with a dataset where each text string contains several pieces of information with varying lengths:
Person information (30 characters)
Delimiter characters (semicolons, pipes) that need to be removed
Date information (appearing as numbers)
Currency values in different formats
The challenge lies in the fact that each section has a different length, making standard split functions less effective. This is precisely where Power Query’s split by position feature becomes invaluable.
Importing Data to Power Query
The first step in the process is to import your data into Power Query. This can be done easily by selecting your data table and using the From Table/Range option in the Data tab. Once your data is in Power Query, you’ll have access to powerful transformation tools that aren’t available in standard Excel.
Power Query provides a user-friendly interface where you can see your data and apply various transformations step by step. This visual approach makes it easier to track changes and ensure that your data is being processed correctly.
Splitting Text by Position
With the data imported into Power Query, we can now split the text column based on specific positions. Here’s how to do it:
Go to the Home tab in Power Query
Select Split Column and choose “By Positions” (not “By Number of Characters”)
Enter the specific positions where you want to split the text
In the example, I needed to split at positions 0, 30, 31, 39, and 41. It’s important to note that Power Query counts from zero for the first position, not one. These numbers represent the starting points for each section of text.
After pressing OK, Power Query creates new columns based on these position splits. The result is five separate columns, each containing a distinct part of the original text string.
Refining the Split Data
Renaming and Removing Columns
After splitting the text, we need to organize our data by giving meaningful names to the important columns and removing unnecessary ones. In the formula bar, I renamed the columns to reflect their content:
“Text.1” became “Person”
“Text.3” became “Date”
“Text.5” became “Payout”
For columns containing delimiters or other unwanted information (in this case, “Text.2” and “Text.4”), we can simply delete them by selecting them with Ctrl+click and pressing the Delete key.
Correcting Data Types
Once we have our properly named columns, we need to ensure that each column has the correct data type. In the demonstration:
The “Date” column was initially recognized as an integer and needed to be converted to the date data type
The “Payout” column contained currency values in different formats that needed proper interpretation
Power Query can automatically detect and convert data types, but sometimes manual intervention is necessary. By clicking on the data type icon in the column header, you can force Power Query to interpret the data as a specific type.
Working with Regional Settings
An important aspect of working with dates and currency values is understanding how regional settings affect data interpretation. In Power Query, you can adjust these settings to match the format of your data.
To access these settings, go to:
Options and Settings
Query Options
Current Workbook
Regional Settings
In my demonstration, the regional settings were set to “English (United Kingdom)” which allowed Power Query to correctly interpret the pound (£) currency symbols regardless of their position in the text. If your data uses different regional formats, you can adjust these settings to match your needs.
For individual columns, you can also click on the data type icon and select “Using Locale” to specify both the data type and regional format for that particular column. This gives you fine-grained control over how Power Query interprets your data.
Finalizing the Transformation
After completing all the necessary transformations — splitting the text, renaming columns, removing unnecessary columns, and setting the correct data types — the final step is to load the transformed data back into Excel.
From the Home tab, select “Close & Load” to export your properly organized table back to Excel. The result is a clean, structured dataset with separate columns for person information, dates, and payment values, all with the appropriate data types.
This technique of splitting text by position in Power Query is particularly useful when dealing with fixed-width data exports from legacy systems, standardized report outputs, or any situation where text strings contain multiple data points at known positions. By mastering this approach, you can quickly transform dense, combined text fields into organized and usable data.
Importing fixed-width column data to Excel can be challenging when each column has a set number of characters, often resulting in unnecessary spaces. In this article, I’ll show you how to properly import fixed-width CSV files using Power Query, ensuring clean data without those problematic extra spaces.
This technique is essential for handling structured data exports from legacy systems or standardized reports.
Understanding Fixed-Width Column Data
Fixed-width column data is a specific format where each column occupies exactly the same number of characters in every row. Unlike delimited formats (like comma-separated values), fixed-width files allocate a predetermined number of spaces for each field. This means shorter values get padded with spaces to maintain the fixed width. For example, the date column might always be 10 characters wide, the merchant column 20 characters, and so on.
This format is common in older systems, government data, and standardized reports, but it presents unique challenges when importing to Excel. The main issue is that these extra spaces can cause problems with data analysis and visualization if not properly handled during the import process.
Importing Fixed-Width Data Using Power Query
Power Query provides excellent tools for handling fixed-width column data. Here’s the process to properly import this type of data:
Initial Import Steps
To begin importing fixed-width data, follow these steps:
Go to the Data tab in Excel
Click on “From Text/CSV” option
Locate and select your fixed-width file
When the Power Query Editor opens, you might need to adjust the window size to see all available options, as sometimes the third column of settings may be hidden if the window is too narrow. Make sure to expand the window to see all import settings.
Configuring Import Settings
During the import process, you’ll need to configure several important settings:
First, check the file origin (encoding) — UTF‑8 works well for most modern files. Then, verify that the delimiter is set correctly. For fixed-width files, Power Query typically automatically detects this format. You should see “Fixed Width” selected rather than comma, tab, or other delimiters.
The data type detection setting is another crucial choice. You have three options:
Based on first 200 rows
Based on entire dataset
Do not detect data types
In most cases with fixed-width data, it’s better to select “Do not detect data types” initially and handle data type conversion after cleaning the spaces. This prevents Power Query from making assumptions about your data types that might be affected by the extra spaces.
Handling Extra Spaces in Power Query
The key challenge with fixed-width data is managing the extra spaces that pad each column. When examining the data in Power Query, you’ll notice these extra spaces in both the column headers and text fields. These spaces are necessary in the original file to maintain the fixed-width format, but they’re problematic for analysis in Excel.
For example, when clicking on a column header, you might see something like “Date ” with several spaces after the actual name. Similarly, text fields like product names or merchant names will have trailing spaces to fill their allocated width.
Removing Extra Spaces
To clean up the data properly:
Select all columns in the Power Query Editor by clicking the first column header and then Shift-clicking the last column
Go to the Transform tab
Click on “Format” and select “Trim”
This important step removes all leading and trailing white spaces from every cell in your dataset while preserving spaces that are actually part of your data (like spaces between words in product names). After trimming the data, you’ll notice a significant improvement in how your data looks.
Setting Headers and Data Types
After removing the extra spaces, you can proceed with configuring headers and data types:
Use the “Use First Row as Headers” option (found in the Transform tab or by right-clicking on the first row) to set your headers properly. Now that the spaces have been removed, your column names will appear clean and professional.
With clean data, you can now set appropriate data types for each column. For example, date columns should be formatted as dates, and revenue or cost columns as currency or decimal numbers. Power Query will help identify appropriate types, but you can manually change them by clicking the data type icon in each column header.
Loading Data to Excel
The final step is to load your cleaned data into Excel:
Click “Close & Load” (or “Close & Load To” for more options)
Select whether to load as a table or to a specific location
Choose an existing worksheet or create a new one
When loading the data, you have the option to add it to the data model if you plan to use it with Power Pivot, but this isn’t necessary for basic Excel analysis. One important thing to note is that while Power Query may detect currency data types, this doesn’t automatically apply Excel’s currency formatting to your table. You’ll need to apply appropriate number formatting in Excel after importing.
Fortunately, when you refresh your data in the future, Excel will maintain any number formatting you’ve applied to the table, making future updates seamless.
Benefits of Using Power Query for Fixed-Width Data
Using Power Query for importing fixed-width data offers several advantages over other methods:
The ability to transform data before loading it to Excel saves significant time compared to manual cleanup after import. The process is also repeatable — if you receive updated versions of the same file, you can simply refresh your query rather than repeating the entire import and cleanup process.
Additionally, Power Query maintains a record of all transformations applied, making your data preparation process transparent and easily adjustable if needed. This approach ensures consistency in how your fixed-width data is processed each time.
Merging tables in Power Query with summarized data provides a powerful way to consolidate information without overwhelming detail. By aggregating invoice details such as counts, sums, averages, and medians, you can create concise reports that highlight key insights while maintaining the original table structure.
This technique is particularly valuable when dealing with sales data where each invoice contains multiple line items that need to be condensed.
In my video tutorial, I demonstrate how to effectively merge tables with summarization in Power Query:
Understanding the Problem: Merging Tables with Summarization
When working with relational data like invoices and their detailed items, we often need to combine information from multiple tables. However, simply merging tables can result in redundant data and unnecessary complexity. The challenge is to merge data while simultaneously summarizing the details to provide meaningful insights.
In this demonstration, I work with two tables: one containing basic invoice information and another with detailed line items for each invoice. The goal is to enhance the invoice table with summarized information from the details table, such as the count of items, average quantities, sum of prices, and even statistical measures like medians.
Loading Tables into Power Query
The first step in our process is to load both tables into Power Query for manipulation. This gives us access to the powerful data transformation capabilities that allow for sophisticated merging and summarization.
To begin, I navigate to the Data tab in Excel and select “From Table/Range” to import our first table into Power Query. For this initial details table, I choose to load it only as a connection rather than importing it directly into the worksheet. This is done by selecting “Close and Load To” and then choosing the “Connection only” option.
After loading the details table, I follow the same process to load the invoice table. This time, however, we’ll perform our merge operation directly within this query rather than creating a new one.
Merging the Tables
With both tables loaded into Power Query, we can now merge them using the common invoice ID field. This is where the power of our approach becomes evident.
To perform the merge:
Select the “Merge Queries” command from the ribbon
Choose the second table (details) to merge with
Select matching columns for merging (invoice ID in both tables)
Choose “Left Outer Join” as the join kind to ensure all invoices are included
After pressing OK, Power Query creates a new column in our invoice table that contains the matching rows from the details table. You’ll notice that for each invoice, we now have a nested table showing all the corresponding detail rows. For example, some invoices might have two detail rows, others three, and some even five or more.
The Key Difference: Aggregating Instead of Expanding
The critical step that differentiates this approach from a standard merge is what we do next. Instead of expanding these nested tables (which would create duplicate invoice rows), we want to aggregate the information they contain.
When we click on the expand button (the small triangle) in the column header, we get options to expand or aggregate. By choosing to aggregate, we can perform calculations across all the detail rows for each invoice.
Creating Summarized Metrics
The power of this technique lies in the variety of aggregations we can perform on the detailed data. For our invoice example, I demonstrate several useful metrics:
Count of items: Simple count of how many line items exist for each invoice
Average quantity: Instead of summing quantities, we calculate the average number of items per line
Sum of unit price: Total value of the invoice
Median of unit price: Statistical measure showing the middle value of all unit prices on the invoice
To create these aggregations, I select the appropriate options from the dropdown menu when expanding the nested table column. By unchecking “Use original column name as prefix,” we get cleaner column names in our results.
Refining the Results
After creating our aggregated columns, we may want to refine the column names for clarity. This can be done directly in the formula bar in Power Query. For example, I rename one of the columns to “Count” to better reflect its purpose.
It’s also important to check the data types of our new columns. Power Query automatically assigns data types, but they may need adjustment. In our case, the “Count” column is correctly set as a decimal number, which works well since counts will always be whole numbers.
When working with monetary values like the sum of unit prices, Power Query may preserve the currency data type. However, it’s worth noting that this data type information doesn’t automatically translate to formatting in Excel. We’ll need to apply currency formatting in Excel after loading the data.
Loading the Final Results
Once we’re satisfied with our merged and summarized table, we can load it back to Excel. From the Home tab in Power Query, I select “Close and Load To” and choose to create a new table in the existing worksheet.
The final result is an enhanced invoice table that includes aggregated information from the details table without duplicating invoice records. This gives us a concise view of our data with valuable insights such as the number of items per invoice, average quantities, total values, and statistical measures.
This technique is particularly valuable for financial analysis, sales reporting, and any situation where you need to combine master records with summarized information from related detail records. By leveraging Power Query’s merging and aggregation capabilities, we can create more informative reports while maintaining a clean, non-redundant data structure.
Sales data analysis often requires summarizing results by specific categories to gain valuable business insights. In this article, I demonstrate how to efficiently calculate sales summaries using Power Query’s grouping functionality, providing a powerful alternative to traditional Excel formulas for analyzing merchant performance, income totals, and product distribution.
The grouping technique shown allows for multiple aggregations in a single operation, dramatically improving efficiency when working with larger datasets.
Watch my detailed tutorial video below to follow along with each step:
Understanding the Sales Summary Challenge
When analyzing sales data, we often need to create summaries that show performance metrics grouped by specific categories. In this demonstration, I’m working with a dataset containing sales transactions that include merchants, products, dates, and income values. My goal is to summarize this data to show several key metrics for each merchant:
Count of sales transactions
Sum of income
Average income per transaction
List of unique products sold
While these calculations can be done using Excel formulas (which I’ve prepared in the worksheet for comparison), using Power Query offers significant advantages, especially for larger datasets or repetitive analysis tasks.
Importing Data into Power Query
The first step in our process is to bring the data into Power Query where we can perform the grouping operations:
Select the data table in Excel
Navigate to the Data tab in the Excel ribbon
Click “From Table/Range” to import the data into Power Query
Once the data is loaded into the Power Query Editor, we can see all our sales records and begin the grouping process to create our summary.
Creating Groups in Power Query
To summarize our data by merchant, we need to use the Group By function in Power Query:
Select the Merchant column in the Power Query Editor
Go to the Home tab and click “Group By”
The default Group By dialog offers basic functionality with just one column for grouping and a single aggregation. For our needs, we require the Advanced option since we want to create multiple aggregations based on the merchant column.
Setting Up Advanced Grouping
In the advanced Group By dialog, we need to configure multiple aggregations:
Set the grouping column to Merchant
Add the following aggregations:
“Count Rows” for the number of sales (named “Count”)
“Sum” of the Income column (named “Sum”)
“Average” of the Income column (named “Average”)
“All Rows” to capture all related data (named “List”)
After clicking OK, Power Query creates a new table with our merchant groups and the specified aggregations. The Count, Sum, and Average values match the Excel formulas we prepared earlier, confirming that our grouping is working correctly.
Customizing the Product List
The “All Rows” aggregation creates a nested table for each merchant containing all their sales records. However, we only want a list of unique products sold by each merchant, not all the row data. Since the Group By dialog doesn’t offer this specific option, we need to modify the M formula directly.
Modifying the M Formula
To customize our product list, we need to edit the M code that Power Query generated:
The default “All Rows” aggregation produces tables with all columns for each merchant
We need to change this to extract only the Product column values
Furthermore, we need to eliminate duplicates from these lists
The original formula includes {_} which references the entire row. We need to replace this with a reference to just the Product column: [Product]. Additionally, we need to wrap this in the List.Distinct function to remove duplicates:
The modified formula section should look like: List.Distinct([Product])
Transforming the Product List into Text
Now that we have lists of unique products for each merchant, we can transform these lists into comma-separated text values for better readability:
Click on the expand button in the List column header
Select “Extract Values…” rather than “Expand to New Rows”
Choose a custom delimiter (comma followed by space: “, ”)
Click OK
This transforms our product lists into readable text strings showing all the unique products each merchant has sold, separated by commas.
Loading the Results Back to Excel
Once we’ve created our summary with all the required metrics, we can load the results back to Excel:
Go to the Home tab and click “Close & Load To…”
Select “Existing Worksheet” and choose where to place the results
Click OK
After loading, we can format the Sum and Average columns as currency to improve readability. The final result is a clean summary table showing the count of sales, total income, average income, and unique products for each merchant.
Benefits of Using Power Query for Grouping
While our example used a relatively small dataset, the advantages of using Power Query for this type of analysis become even more apparent with larger data volumes:
Efficiency — Performs calculations much faster than Excel formulas for large datasets
Repeatability — The query can be refreshed when data changes without redoing formulas
Flexibility — Allows for complex aggregations and transformations that would be difficult with Excel formulas
Data cleaning — Integrates with other Power Query transformations for comprehensive data preparation
By mastering this grouping technique in Power Query, you can create sophisticated sales summaries and other data aggregations with just a few clicks, saving significant time and effort compared to traditional Excel methods.
Using appending in Power Query offers a streamlined approach to automatically grade student exams based on predefined thresholds. This method eliminates manual grading errors and saves significant time when processing large sets of student scores, providing an efficient way to manage academic assessments.
The technique combines student scores with grade thresholds to generate accurate results that can be easily updated when grading criteria change.
Watch my step-by-step tutorial on how to implement this grading system using Power Query:
Understanding the Grading Process with Power Query
Power Query’s appending feature allows us to combine multiple tables in a way that creates relationships between student scores and grade thresholds. The appending process essentially stacks tables on top of each other, creating a single dataset that we can manipulate to automatically assign grades. This approach is particularly useful in educational settings where instructors need to process multiple exam results efficiently.
The primary advantage of using this method is that when grading criteria change, you don’t need to manually reassign grades to each student. Simply update the threshold values in your source table, refresh the query, and all grades will automatically update to reflect the new standards.
Required Resources
Before starting the grading process, you’ll need two essential tables:
A table containing student names and their exam points
A table defining grade thresholds (what point ranges correspond to which letter grades)
These tables will serve as the foundation for our automated grading system. The structure of these tables doesn’t need to be complex—just clearly organized with appropriate headers to identify the data.
Setting Up the Tables in Power Query
To begin implementing the grading system, we need to import both tables into Power Query. This process allows us to manipulate and combine the data efficiently.
First, with the student scores table selected, navigate to the Data tab in Excel. From there, select “From Table/Range” to import the first table into Power Query. When prompted to load the data, choose “Create Connection Only” as we’ll be manipulating this data further before bringing it back to Excel.
Repeat the same process for your grade thresholds table. Having both tables accessible in Power Query is essential for the next step where we’ll combine them through appending.
Appending the Tables
Once both tables are in Power Query, we can combine them using the append function:
On the Home tab within Power Query Editor, click the Append Queries button (not “Append Queries as New”). In the dialog box that appears, confirm that you want to append just the two tables you’ve imported. Select the student table as your primary table and the grades threshold table as the table to append.
After appending, you’ll notice that the resulting table contains all rows from both original tables. The student data appears first, followed by the threshold data. This combined dataset gives us everything we need to assign grades automatically.
Organizing and Sorting the Data
For our grading system to work properly, we need to sort the appended data by the number of points. This sorting is a crucial step because it allows us to arrange scores in ascending order, making it easier to assign appropriate grades.
Click on the column containing the points/scores and sort from A to Z (ascending). This action arranges all scores from lowest to highest, with the grade thresholds integrated among the student scores based on their point values.
Because of how the data is now arranged, the grade thresholds appear at positions where they naturally divide the student scores into different grade categories.
Assigning Grades Automatically
With our data properly sorted, we can now use a simple but powerful technique to assign grades to each student. The key to this process is the Fill Down command in Power Query.
The grade column initially has values only in the rows that came from our threshold table. To assign these grades to students, select the grade column and navigate to the Transform tab. Click on “Fill” and then “Down.” This action copies each grade down to all rows below it until another grade is encountered.
As a result, each student row is automatically assigned the grade corresponding to their score based on the defined thresholds. For example, if the threshold for grade C is 40 points, students with 40 or more points (but below the next threshold) will be assigned a C.
Filtering and Finalizing the Results
After filling down the grades, we need to filter out the threshold rows to see only the student results:
Click the filter button on the column containing student names and select “Filter out empty values” or specifically select only the rows containing actual student data. This leaves us with just the student information and their corresponding grades.
With the data filtered and grades assigned, we can now load this information back to Excel. Navigate to the Home tab and select “Close & Load To.” Choose where you want to place the results in your workbook.
Updating Grades When Thresholds Change
One of the most powerful aspects of this approach is how easily you can adjust grading criteria:
If you need to change the grading thresholds, simply update the values in your original threshold table. For example, you might change the minimum score for an A from 90 to 88 points. After making these changes, save your file and then refresh the query (right-click on your results table and select “Refresh”).
The entire grading process will automatically run again with the new thresholds, updating all student grades accordingly. This makes the system extremely flexible and saves significant time when adjustments are needed.
Real-World Application Example
In my video demonstration, I showed how this system works with a sample dataset. For instance, Jennifer earned 13 points which resulted in an F grade, Robert received 34 points also resulting in an F, while Olivia scored 40 points which met the threshold for an E grade.
After changing the thresholds and refreshing the query, the grades updated automatically to reflect the new standards. This dynamic updating capability is what makes this approach so valuable for educators who may need to adjust grading scales based on class performance or other factors.
Power Query offers powerful solutions for cleaning data across columns by leveraging information contained in related columns. In this article, I’ll demonstrate how to efficiently remove cryptocurrency symbols from a name column when those symbols already exist in a separate column, even when the symbols have varying lengths.
This technique is particularly valuable when dealing with inconsistent data formats that require standardization for analysis.
Watch my step-by-step tutorial video to see these techniques in action:
Understanding the Data Cleaning Challenge
When working with financial data, especially cryptocurrency information, you’ll often encounter datasets where the same information appears redundantly across multiple columns. In my example, I have a table containing cryptocurrency data where the name column includes both the cryptocurrency name and its symbol (like “Bitcoin BTC”), while a separate symbol column already contains just the symbol (“BTC”).
The challenge lies in cleaning the name column by removing these redundant symbols. What makes this particularly tricky is that cryptocurrency symbols vary in length — some might be three characters (BTC), others four (DOGE) or more. This inconsistency means we can’t simply remove a fixed number of characters from the end of each name.
Preparing the Data in Power Query
To begin the cleaning process, we need to load our data into Power Query where we can apply the necessary transformations:
Navigate to the Data tab in Excel
Select your data range and choose to load it to Power Query
Remove any unnecessary columns to focus on the task at hand
In my case, I initially removed several columns to focus on just the key columns for this demonstration — the name column containing both cryptocurrency names and symbols, and the symbol column containing just the symbols. This simplified approach allows us to concentrate specifically on the cleaning technique without distraction.
Solution #1: Using Text After Delimiter
The first solution uses Power Query’s built-in functions to extract the relevant part of the text. This approach is more straightforward and uses the UI-based options:
To implement this solution:
Select the name column in Power Query
Go to the Add Column tab
Choose Extract and then Text After Delimiter
Initially, when using the UI, you might try entering a specific symbol like “BTC” as the delimiter. However, this would only work for one cryptocurrency. The real power comes from modifying the formula to use the symbol column as a reference.
After creating the initial formula, you’ll need to edit it in the formula bar to make it dynamic. Replace the hardcoded delimiter text with a reference to the symbol column. This way, Power Query will use each row’s corresponding symbol to extract just the cryptocurrency name, regardless of the symbol’s length.
Solution #2: Using Text.RemoveRange Function
The second solution is more complex but demonstrates the flexibility of Power Query’s M language. For this approach:
Go to the Add Column tab
Select Custom Column
Use the Text.RemoveRange function to precisely remove the symbol part
This function works by removing a specific range of characters from the text. It requires three parameters:
The source text (our name column)
The starting position (0‑based in Power Query)
The number of characters to remove (which we determine using the length of the symbol)
Remember that Power Query uses zero-based indexing, meaning the first character position is 0, not 1. This is crucial when working with text manipulation functions.
Finalizing the Transformation
After applying either solution, we need to clean up our query:
Rename the newly created column to something meaningful (like “name”)
Delete the original name column that contained the symbols
If needed, restore any previously removed columns by deleting the “Removed Columns” step
Arrange columns in the desired order by dragging them
In my example, I also demonstrated how to clean another column (circulating supply) that had a similar issue but with a simpler pattern. Since all the unwanted symbols in this column were preceded by a space, I could use the Extract Text Before Delimiter function with a space as the delimiter.
Loading the Cleaned Data Back to Excel
Once all transformations are complete, we can load the cleaned data back to Excel:
Go to the Home tab in Power Query
Select Close & Load To
Choose whether to load to a new worksheet or an existing location
In my demonstration, I loaded the data to cell A13 on the existing worksheet, which allows for easy comparison between the original and cleaned data.
Why This Approach Matters
The beauty of using Power Query for this type of data cleaning is that it creates a repeatable process. If your data source updates, you can simply refresh the query, and all the cleaning steps will be reapplied automatically. This saves significant time compared to manual cleaning or using formulas that would need to be copied down for each new set of data.
Additionally, by leveraging information from one column to clean another, we create a more robust solution that can handle variations in the data format without breaking. This approach is far superior to hard-coding specific text values or character counts that might change across different datasets.
In this guide, I’ll show you how to automatically remove completely empty columns in your Excel data using Power Query. This powerful technique saves you from tedious manual deletion and works even when columns contain null values, empty cells, and empty text strings (represented by “”).
The approach I’ll demonstrate uses a specific Power Query formula that identifies and removes columns with no meaningful data, greatly improving your data cleaning workflow.
Watch my detailed video walkthrough of this process:
Understanding Empty Columns in Power Query
When working with data in Excel, we often encounter columns that appear empty but can actually contain different types of “emptiness” that Power Query recognizes differently. It’s important to understand that Power Query distinguishes between null values (truly empty cells), and cells containing empty text strings (“”).
An empty text string occurs when a cell contains a formula that returns “” or when you paste such values. These look empty visually in Excel, but Power Query treats them differently than cells with null values. Our goal is to create a solution that identifies and removes columns regardless of which type of “emptiness” they contain, as long as the entire column contains no meaningful data.
Importing Data into Power Query
To begin the process of removing empty columns, we first need to import our data into Power Query. Here’s example how to do this:
Go to the Data tab in Excel
Click on “Get Data” → “From File” → “From Excel”
Locate and select your Excel file
Choose the worksheet containing your data (in my example, Sheet1)
After importing, you’ll notice that some columns might appear empty but contain different types of emptiness — some cells might have null values (shown as “null” in Power Query), while others might be truly empty or contain empty text strings. Our code will handle all these scenarios automatically.
The Power Query Solution
Rather than manually deleting empty columns, we’ll use a custom formula in Power Query that automatically identifies and removes all empty columns. This approach is much more efficient, especially when working with large datasets.
The Complete Code
Here’s the complete Power Query formula that we’ll use to remove empty columns:
To use this code in your own projects, you’ll need to replace #“Change Type” with the name of your previous step in Power Query. This reference appears in three places in the formula, so be sure to update all instances.
Breaking Down the Formula
Let’s break down this formula to understand how it works:
The formula starts with Table.RemoveColumns(), which is the function that will eventually remove our empty columns from the table. This function takes two parameters: the table we’re working with, and a list of column names to remove.
To generate the list of column names to remove, we use List.Select(), which filters a list based on a condition. We’re filtering the list of all column names (obtained using Table.ColumnNames()) to find those that meet our “empty column” criteria.
Extract all values in the column using Table.Column()
Remove any values that are empty strings or null using List.RemoveMatchingItems()
Count how many items remain using List.Count()
Check if this count equals zero (meaning the column is completely empty)
If the count equals zero, the column is considered empty and will be included in our list of columns to remove.
Implementing the Solution
To implement this solution in your Power Query Editor:
Click “Add Step” to create a new transformation step
In the formula bar, paste the complete code shown above
Replace “#“Change Type”” with the name of your previous step (typically the step after promoting headers)
Press Enter to apply the transformation
You should now see your data with all empty columns removed. The transformation happens immediately, leaving only columns that contain at least one non-empty value.
Testing the Solution
To verify that our solution works correctly, we can test it with different types of data. In my video, I demonstrated this by:
1. Starting with a dataset that contained several empty columns (with various types of emptiness)
2. Applying our Power Query solution to remove these columns
3. Modifying the source data to add a new column with a single value
4. Refreshing the query to verify that the column with a value is preserved
The solution correctly identified and retained columns that had at least one meaningful value, while removing columns that were completely empty.
Advanced Usage and Adaptations
This solution can be easily adapted to work with different datasets. The key is to ensure you’re referencing the correct previous step in your Power Query transformation sequence. If your previous step is named differently, simply replace all instances of “#“Change Type”” with the appropriate step name.
When working with very large datasets, this automated approach can save significant time compared to manually identifying and removing empty columns. Additionally, the solution is dynamic — if your source data changes and a previously empty column now contains values, refreshing the query will automatically retain that column.
Loading the Results
After applying our transformation, we can load the results back to Excel:
Click “Close & Load” (or “Close & Load To…” if you want to specify a destination)
Choose your preferred destination (worksheet, data model, etc.)
The cleaned data, with empty columns removed, will be loaded to your specified location
The resulting table will contain only meaningful columns that have at least one value. This makes your data more concise and easier to work with.
When your source data changes, simply refresh the query to apply the same transformations to the updated data. The empty column removal logic will be reapplied automatically, adapting to any changes in your source data’s structure.