Remove Top or Bottom Rows Automatically

Remove Top or Bottom Rows Automatically

Dynam­i­cal­ly remov­ing top and bot­tom rows in Pow­er Query can sig­nif­i­cant­ly stream­line your data clean­ing process when deal­ing with incon­sis­tent data imports. This tech­nique allows you to auto­mat­i­cal­ly elim­i­nate unnec­es­sary head­er rows or foot­er infor­ma­tion based on spe­cif­ic con­di­tions rather than fixed row counts, mak­ing your data trans­for­ma­tion process more robust and adapt­able to chang­ing source files.

The abil­i­ty to use con­di­tions rather than sta­t­ic num­bers is par­tic­u­lar­ly valu­able when work­ing with reg­u­lar­ly updat­ed reports that may con­tain vary­ing amounts of meta­da­ta or sum­ma­ry information.

In my video, I demon­strate how to imple­ment this dynam­ic row removal tech­nique that I learned at an Excel Lon­don Meetup:

Understanding the Problem with Static Row Removal

When import­ing data from text files, CSV files, or oth­er sources, you often encounter extra­ne­ous infor­ma­tion at the top and bot­tom of your dataset. These might include title rows, explana­to­ry notes, sum­ma­ry sta­tis­tics, or foot­er infor­ma­tion that aren’t part of the actu­al data you need to ana­lyze. Using Pow­er Query’s stan­dard “Remove Top Rows” or “Remove Bot­tom Rows” func­tions with a fixed num­ber works fine when your source data struc­ture nev­er changes, but becomes prob­lem­at­ic when the num­ber of these rows varies.

In the exam­ple shown in my video, we have mul­ti­ple rows of meta­da­ta before the actu­al head­er row (which con­tains “Date” as the first col­umn val­ue), and sev­er­al rows of addi­tion­al infor­ma­tion at the bot­tom of the data that need to be removed. There are also some miss­ing val­ues through­out the data that make sim­ple fil­ter­ing ineffective.

Dynamically Removing Top Rows Based on Conditions

The con­ven­tion­al approach to remov­ing top rows in Pow­er Query involves spec­i­fy­ing a fixed num­ber. How­ev­er, this can be prob­lem­at­ic when the num­ber of head­er rows changes. The dynam­ic solu­tion involves using a con­di­tion rather than a fixed count.

Here’s how to imple­ment this technique:

  1. Go to the Home tab in Pow­er Query Editor
  2. Select “Remove Rows” and then “Remove Top Rows”
  3. Instead of enter­ing a num­ber, mod­i­fy the for­mu­la to use the “each” key­word fol­lowed by a condition

The for­mu­la will look some­thing like this: each [Column1] <> “Date” (assum­ing “Date” is the head­er text in your first col­umn). This tells Pow­er Query to keep remov­ing rows until it finds a row where the first col­umn con­tains the text “Date”.

The Technical Details of the Table.Skip Function

Behind the scenes, Pow­er Query uses the Table.Skip func­tion when you remove top rows. This func­tion has a hid­den capa­bil­i­ty that isn’t obvi­ous from the user inter­face — it can accept either a count or a con­di­tion parameter.

When using a con­di­tion, the syn­tax changes from sim­ply pro­vid­ing a num­ber to using the format:

each [condition]

The con­di­tion is eval­u­at­ed for each row, start­ing from the top, and rows are removed until the con­di­tion is no longer true. This allows for dynam­ic adap­ta­tion to vary­ing source data structures.

Dynamically Removing Bottom Rows

Sim­i­lar­ly, we can apply the same con­cept to remove rows from the bot­tom of our dataset. This is par­tic­u­lar­ly use­ful when deal­ing with files that con­tain sum­ma­ry infor­ma­tion, notes, or oth­er foot­er data that should be exclud­ed from analysis.

The process for remov­ing bot­tom rows dynam­i­cal­ly is:

  1. Go to the Home tab in Pow­er Query Editor
  2. Select “Remove Rows” and then “Remove Bot­tom Rows”
  3. Replace the sta­t­ic num­ber with a con­di­tion using the “each” keyword

For exam­ple, you might use a for­mu­la like: each [Mer­chant] = “” to remove rows from the bot­tom where the Mer­chant col­umn con­tains an emp­ty text string. Or you might use each [Rev­enue] = null to remove rows where the Rev­enue col­umn con­tains null values.

Handling Different Types of Empty Values

When work­ing with bot­tom rows, it’s impor­tant to under­stand the dif­fer­ent types of emp­ty val­ues that might appear in your data:

  • Emp­ty text strings — rep­re­sent­ed by “” in formulas
  • Null val­ues — rep­re­sent­ed by null in formulas
  • Miss­ing val­ues — which might be null or emp­ty depend­ing on the data source

In the video demon­stra­tion, I show how to han­dle both emp­ty text strings and null val­ues as con­di­tions for remov­ing bot­tom rows. The key is to iden­ti­fy which col­umn and which type of emp­ty val­ue reli­ably indi­cates the foot­er sec­tion of your data.

Practical Applications and Benefits

This dynam­ic row removal tech­nique is par­tic­u­lar­ly valu­able in sev­er­al scenarios:

When work­ing with reg­u­lar­ly updat­ed reports where the struc­ture might change slight­ly between ver­sions, this approach ensures your Pow­er Query solu­tion remains robust. It’s also help­ful when deal­ing with data exports from sys­tems that include vary­ing amounts of meta­da­ta or when con­sol­i­dat­ing mul­ti­ple files that might have dif­fer­ent head­er structures.

The major ben­e­fits include:

  • More resilient data trans­for­ma­tion process­es that don’t break when source for­mats change slightly
  • Reduced need for man­u­al inter­ven­tion when pro­cess­ing new data
  • Abil­i­ty to han­dle files with incon­sis­tent struc­ture automatically
  • Greater flex­i­bil­i­ty com­pared to sta­t­ic row removal or sim­ple filtering

This tech­nique demon­strates the pow­er of M lan­guage in Pow­er Query, allow­ing for solu­tions that go beyond what’s imme­di­ate­ly avail­able in the user inter­face. By under­stand­ing and lever­ag­ing these more advanced capa­bil­i­ties, you can cre­ate more robust data trans­for­ma­tion processes.

Important Considerations

When imple­ment­ing this tech­nique, keep in mind a few impor­tant points:

The con­di­tion you use must reli­ably iden­ti­fy the bound­ary between the rows you want to keep and those you want to remove. Choose col­umn val­ues that are con­sis­tent­ly present (or con­sis­tent­ly absent) at these bound­aries. Also be aware that if your con­di­tion nev­er eval­u­ates to false, you could poten­tial­ly remove all rows from your dataset, so test­ing with rep­re­sen­ta­tive sam­ple data is essential.

Addi­tion­al­ly, remem­ber that this tech­nique works even when you have miss­ing val­ues in your actu­al data. As shown in the video, the rows are only removed when they match the spe­cif­ic con­di­tion you’ve defined, allow­ing rows with some miss­ing val­ues to be retained as long as they don’t match your removal condition.

How to Split Text by Position in Power Query

How to Split Text by Position in Power Query

Split­ting text by length in Pow­er Query can trans­form clut­tered data into orga­nized, usable infor­ma­tion with­out com­plex for­mu­las. This tech­nique allows you to break down text strings into sep­a­rate columns based on spe­cif­ic char­ac­ter posi­tions, mak­ing it espe­cial­ly use­ful when deal­ing with fixed-width data for­mats that con­tain mul­ti­ple pieces of information.

The real pow­er of this approach lies in its abil­i­ty to han­dle irreg­u­lar split­ting require­ments where each sec­tion has a dif­fer­ent length.

In my video, I demon­strate the com­plete process of split­ting text by length in Pow­er Query:

Understanding the Data Structure

When work­ing with text data that needs to be split, it’s essen­tial to first ana­lyze the struc­ture of your text. In the demon­stra­tion, I work with a dataset where each text string con­tains sev­er­al pieces of infor­ma­tion with vary­ing lengths:

  • Per­son infor­ma­tion (30 characters)
  • Delim­iter char­ac­ters (semi­colons, pipes) that need to be removed
  • Date infor­ma­tion (appear­ing as numbers)
  • Cur­ren­cy val­ues in dif­fer­ent formats

The chal­lenge lies in the fact that each sec­tion has a dif­fer­ent length, mak­ing stan­dard split func­tions less effec­tive. This is pre­cise­ly where Pow­er Query’s split by posi­tion fea­ture becomes invaluable.

Importing Data to Power Query

The first step in the process is to import your data into Pow­er Query. This can be done eas­i­ly by select­ing your data table and using the From Table/Range option in the Data tab. Once your data is in Pow­er Query, you’ll have access to pow­er­ful trans­for­ma­tion tools that aren’t avail­able in stan­dard Excel.

Pow­er Query pro­vides a user-friend­ly inter­face where you can see your data and apply var­i­ous trans­for­ma­tions step by step. This visu­al approach makes it eas­i­er to track changes and ensure that your data is being processed correctly.

Splitting Text by Position

With the data import­ed into Pow­er Query, we can now split the text col­umn based on spe­cif­ic posi­tions. Here’s how to do it:

  1. Go to the Home tab in Pow­er Query
  2. Select Split Col­umn and choose “By Posi­tions” (not “By Num­ber of Characters”)
  3. Enter the spe­cif­ic posi­tions where you want to split the text

In the exam­ple, I need­ed to split at posi­tions 0, 30, 31, 39, and 41. It’s impor­tant to note that Pow­er Query counts from zero for the first posi­tion, not one. These num­bers rep­re­sent the start­ing points for each sec­tion of text.

After press­ing OK, Pow­er Query cre­ates new columns based on these posi­tion splits. The result is five sep­a­rate columns, each con­tain­ing a dis­tinct part of the orig­i­nal text string.

Refining the Split Data

Renaming and Removing Columns

After split­ting the text, we need to orga­nize our data by giv­ing mean­ing­ful names to the impor­tant columns and remov­ing unnec­es­sary ones. In the for­mu­la bar, I renamed the columns to reflect their content:

  • “Text.1” became “Per­son”
  • “Text.3” became “Date”
  • “Text.5” became “Pay­out”

For columns con­tain­ing delim­iters or oth­er unwant­ed infor­ma­tion (in this case, “Text.2” and “Text.4”), we can sim­ply delete them by select­ing them with Ctrl+click and press­ing the Delete key.

Correcting Data Types

Once we have our prop­er­ly named columns, we need to ensure that each col­umn has the cor­rect data type. In the demonstration:

  • The “Date” col­umn was ini­tial­ly rec­og­nized as an inte­ger and need­ed to be con­vert­ed to the date data type
  • The “Pay­out” col­umn con­tained cur­ren­cy val­ues in dif­fer­ent for­mats that need­ed prop­er interpretation

Pow­er Query can auto­mat­i­cal­ly detect and con­vert data types, but some­times man­u­al inter­ven­tion is nec­es­sary. By click­ing on the data type icon in the col­umn head­er, you can force Pow­er Query to inter­pret the data as a spe­cif­ic type.

Working with Regional Settings

An impor­tant aspect of work­ing with dates and cur­ren­cy val­ues is under­stand­ing how region­al set­tings affect data inter­pre­ta­tion. In Pow­er Query, you can adjust these set­tings to match the for­mat of your data.

To access these set­tings, go to:

  1. Options and Settings
  2. Query Options
  3. Cur­rent Workbook
  4. Region­al Settings

In my demon­stra­tion, the region­al set­tings were set to “Eng­lish (Unit­ed King­dom)” which allowed Pow­er Query to cor­rect­ly inter­pret the pound (£) cur­ren­cy sym­bols regard­less of their posi­tion in the text. If your data uses dif­fer­ent region­al for­mats, you can adjust these set­tings to match your needs.

For indi­vid­ual columns, you can also click on the data type icon and select “Using Locale” to spec­i­fy both the data type and region­al for­mat for that par­tic­u­lar col­umn. This gives you fine-grained con­trol over how Pow­er Query inter­prets your data.

Finalizing the Transformation

After com­plet­ing all the nec­es­sary trans­for­ma­tions — split­ting the text, renam­ing columns, remov­ing unnec­es­sary columns, and set­ting the cor­rect data types — the final step is to load the trans­formed data back into Excel.

From the Home tab, select “Close & Load” to export your prop­er­ly orga­nized table back to Excel. The result is a clean, struc­tured dataset with sep­a­rate columns for per­son infor­ma­tion, dates, and pay­ment val­ues, all with the appro­pri­ate data types.

This tech­nique of split­ting text by posi­tion in Pow­er Query is par­tic­u­lar­ly use­ful when deal­ing with fixed-width data exports from lega­cy sys­tems, stan­dard­ized report out­puts, or any sit­u­a­tion where text strings con­tain mul­ti­ple data points at known posi­tions. By mas­ter­ing this approach, you can quick­ly trans­form dense, com­bined text fields into orga­nized and usable data.

How to Import Fixed-Width Files to Excel

How to Import Fixed-Width Files to Excel

Import­ing fixed-width col­umn data to Excel can be chal­leng­ing when each col­umn has a set num­ber of char­ac­ters, often result­ing in unnec­es­sary spaces. In this arti­cle, I’ll show you how to prop­er­ly import fixed-width CSV files using Pow­er Query, ensur­ing clean data with­out those prob­lem­at­ic extra spaces.

This tech­nique is essen­tial for han­dling struc­tured data exports from lega­cy sys­tems or stan­dard­ized reports.

Understanding Fixed-Width Column Data

Fixed-width col­umn data is a spe­cif­ic for­mat where each col­umn occu­pies exact­ly the same num­ber of char­ac­ters in every row. Unlike delim­it­ed for­mats (like com­ma-sep­a­rat­ed val­ues), fixed-width files allo­cate a pre­de­ter­mined num­ber of spaces for each field. This means short­er val­ues get padded with spaces to main­tain the fixed width. For exam­ple, the date col­umn might always be 10 char­ac­ters wide, the mer­chant col­umn 20 char­ac­ters, and so on.

This for­mat is com­mon in old­er sys­tems, gov­ern­ment data, and stan­dard­ized reports, but it presents unique chal­lenges when import­ing to Excel. The main issue is that these extra spaces can cause prob­lems with data analy­sis and visu­al­iza­tion if not prop­er­ly han­dled dur­ing the import process.

Importing Fixed-Width Data Using Power Query

Pow­er Query pro­vides excel­lent tools for han­dling fixed-width col­umn data. Here’s the process to prop­er­ly import this type of data:

Initial Import Steps

To begin import­ing fixed-width data, fol­low these steps:

  1. Go to the Data tab in Excel
  2. Click on “From Text/CSV” option
  3. Locate and select your fixed-width file

When the Pow­er Query Edi­tor opens, you might need to adjust the win­dow size to see all avail­able options, as some­times the third col­umn of set­tings may be hid­den if the win­dow is too nar­row. Make sure to expand the win­dow to see all import settings.

Configuring Import Settings

Dur­ing the import process, you’ll need to con­fig­ure sev­er­al impor­tant settings:

First, check the file ori­gin (encod­ing) — UTF‑8 works well for most mod­ern files. Then, ver­i­fy that the delim­iter is set cor­rect­ly. For fixed-width files, Pow­er Query typ­i­cal­ly auto­mat­i­cal­ly detects this for­mat. You should see “Fixed Width” select­ed rather than com­ma, tab, or oth­er delimiters.

The data type detec­tion set­ting is anoth­er cru­cial choice. You have three options:

  • Based on first 200 rows
  • Based on entire dataset
  • Do not detect data types

In most cas­es with fixed-width data, it’s bet­ter to select “Do not detect data types” ini­tial­ly and han­dle data type con­ver­sion after clean­ing the spaces. This pre­vents Pow­er Query from mak­ing assump­tions about your data types that might be affect­ed by the extra spaces.

Handling Extra Spaces in Power Query

The key chal­lenge with fixed-width data is man­ag­ing the extra spaces that pad each col­umn. When exam­in­ing the data in Pow­er Query, you’ll notice these extra spaces in both the col­umn head­ers and text fields. These spaces are nec­es­sary in the orig­i­nal file to main­tain the fixed-width for­mat, but they’re prob­lem­at­ic for analy­sis in Excel.

For exam­ple, when click­ing on a col­umn head­er, you might see some­thing like “Date ” with sev­er­al spaces after the actu­al name. Sim­i­lar­ly, text fields like prod­uct names or mer­chant names will have trail­ing spaces to fill their allo­cat­ed width.

Removing Extra Spaces

To clean up the data properly:

  1. Select all columns in the Pow­er Query Edi­tor by click­ing the first col­umn head­er and then Shift-click­ing the last column
  2. Go to the Trans­form tab
  3. Click on “For­mat” and select “Trim”

This impor­tant step removes all lead­ing and trail­ing white spaces from every cell in your dataset while pre­serv­ing spaces that are actu­al­ly part of your data (like spaces between words in prod­uct names). After trim­ming the data, you’ll notice a sig­nif­i­cant improve­ment in how your data looks.

Setting Headers and Data Types

After remov­ing the extra spaces, you can pro­ceed with con­fig­ur­ing head­ers and data types:

Use the “Use First Row as Head­ers” option (found in the Trans­form tab or by right-click­ing on the first row) to set your head­ers prop­er­ly. Now that the spaces have been removed, your col­umn names will appear clean and professional.

With clean data, you can now set appro­pri­ate data types for each col­umn. For exam­ple, date columns should be for­mat­ted as dates, and rev­enue or cost columns as cur­ren­cy or dec­i­mal num­bers. Pow­er Query will help iden­ti­fy appro­pri­ate types, but you can man­u­al­ly change them by click­ing the data type icon in each col­umn header.

Loading Data to Excel

The final step is to load your cleaned data into Excel:

  1. Click “Close & Load” (or “Close & Load To” for more options)
  2. Select whether to load as a table or to a spe­cif­ic location
  3. Choose an exist­ing work­sheet or cre­ate a new one

When load­ing the data, you have the option to add it to the data mod­el if you plan to use it with Pow­er Piv­ot, but this isn’t nec­es­sary for basic Excel analy­sis. One impor­tant thing to note is that while Pow­er Query may detect cur­ren­cy data types, this does­n’t auto­mat­i­cal­ly apply Excel’s cur­ren­cy for­mat­ting to your table. You’ll need to apply appro­pri­ate num­ber for­mat­ting in Excel after importing.

For­tu­nate­ly, when you refresh your data in the future, Excel will main­tain any num­ber for­mat­ting you’ve applied to the table, mak­ing future updates seamless.

Benefits of Using Power Query for Fixed-Width Data

Using Pow­er Query for import­ing fixed-width data offers sev­er­al advan­tages over oth­er methods:

The abil­i­ty to trans­form data before load­ing it to Excel saves sig­nif­i­cant time com­pared to man­u­al cleanup after import. The process is also repeat­able — if you receive updat­ed ver­sions of the same file, you can sim­ply refresh your query rather than repeat­ing the entire import and cleanup process.

Addi­tion­al­ly, Pow­er Query main­tains a record of all trans­for­ma­tions applied, mak­ing your data prepa­ra­tion process trans­par­ent and eas­i­ly adjustable if need­ed. This approach ensures con­sis­ten­cy in how your fixed-width data is processed each time.

Merge Tables with Summarized Data for Cleaner Reports

Merge Tables with Summarized Data for Cleaner Reports

Merg­ing tables in Pow­er Query with sum­ma­rized data pro­vides a pow­er­ful way to con­sol­i­date infor­ma­tion with­out over­whelm­ing detail. By aggre­gat­ing invoice details such as counts, sums, aver­ages, and medi­ans, you can cre­ate con­cise reports that high­light key insights while main­tain­ing the orig­i­nal table structure.

This tech­nique is par­tic­u­lar­ly valu­able when deal­ing with sales data where each invoice con­tains mul­ti­ple line items that need to be condensed.

In my video tuto­r­i­al, I demon­strate how to effec­tive­ly merge tables with sum­ma­riza­tion in Pow­er Query:

Understanding the Problem: Merging Tables with Summarization

When work­ing with rela­tion­al data like invoic­es and their detailed items, we often need to com­bine infor­ma­tion from mul­ti­ple tables. How­ev­er, sim­ply merg­ing tables can result in redun­dant data and unnec­es­sary com­plex­i­ty. The chal­lenge is to merge data while simul­ta­ne­ous­ly sum­ma­riz­ing the details to pro­vide mean­ing­ful insights.

In this demon­stra­tion, I work with two tables: one con­tain­ing basic invoice infor­ma­tion and anoth­er with detailed line items for each invoice. The goal is to enhance the invoice table with sum­ma­rized infor­ma­tion from the details table, such as the count of items, aver­age quan­ti­ties, sum of prices, and even sta­tis­ti­cal mea­sures like medians.

Loading Tables into Power Query

The first step in our process is to load both tables into Pow­er Query for manip­u­la­tion. This gives us access to the pow­er­ful data trans­for­ma­tion capa­bil­i­ties that allow for sophis­ti­cat­ed merg­ing and summarization.

To begin, I nav­i­gate to the Data tab in Excel and select “From Table/Range” to import our first table into Pow­er Query. For this ini­tial details table, I choose to load it only as a con­nec­tion rather than import­ing it direct­ly into the work­sheet. This is done by select­ing “Close and Load To” and then choos­ing the “Con­nec­tion only” option.

After load­ing the details table, I fol­low the same process to load the invoice table. This time, how­ev­er, we’ll per­form our merge oper­a­tion direct­ly with­in this query rather than cre­at­ing a new one.

Merging the Tables

With both tables loaded into Pow­er Query, we can now merge them using the com­mon invoice ID field. This is where the pow­er of our approach becomes evident.

To per­form the merge:

  1. Select the “Merge Queries” com­mand from the ribbon
  2. Choose the sec­ond table (details) to merge with
  3. Select match­ing columns for merg­ing (invoice ID in both tables)
  4. Choose “Left Out­er Join” as the join kind to ensure all invoic­es are included

After press­ing OK, Pow­er Query cre­ates a new col­umn in our invoice table that con­tains the match­ing rows from the details table. You’ll notice that for each invoice, we now have a nest­ed table show­ing all the cor­re­spond­ing detail rows. For exam­ple, some invoic­es might have two detail rows, oth­ers three, and some even five or more.

The Key Difference: Aggregating Instead of Expanding

The crit­i­cal step that dif­fer­en­ti­ates this approach from a stan­dard merge is what we do next. Instead of expand­ing these nest­ed tables (which would cre­ate dupli­cate invoice rows), we want to aggre­gate the infor­ma­tion they contain.

When we click on the expand but­ton (the small tri­an­gle) in the col­umn head­er, we get options to expand or aggre­gate. By choos­ing to aggre­gate, we can per­form cal­cu­la­tions across all the detail rows for each invoice.

Creating Summarized Metrics

The pow­er of this tech­nique lies in the vari­ety of aggre­ga­tions we can per­form on the detailed data. For our invoice exam­ple, I demon­strate sev­er­al use­ful metrics:

  • Count of items: Sim­ple count of how many line items exist for each invoice
  • Aver­age quan­ti­ty: Instead of sum­ming quan­ti­ties, we cal­cu­late the aver­age num­ber of items per line
  • Sum of unit price: Total val­ue of the invoice
  • Medi­an of unit price: Sta­tis­ti­cal mea­sure show­ing the mid­dle val­ue of all unit prices on the invoice

To cre­ate these aggre­ga­tions, I select the appro­pri­ate options from the drop­down menu when expand­ing the nest­ed table col­umn. By uncheck­ing “Use orig­i­nal col­umn name as pre­fix,” we get clean­er col­umn names in our results.

Refining the Results

After cre­at­ing our aggre­gat­ed columns, we may want to refine the col­umn names for clar­i­ty. This can be done direct­ly in the for­mu­la bar in Pow­er Query. For exam­ple, I rename one of the columns to “Count” to bet­ter reflect its purpose.

It’s also impor­tant to check the data types of our new columns. Pow­er Query auto­mat­i­cal­ly assigns data types, but they may need adjust­ment. In our case, the “Count” col­umn is cor­rect­ly set as a dec­i­mal num­ber, which works well since counts will always be whole numbers.

When work­ing with mon­e­tary val­ues like the sum of unit prices, Pow­er Query may pre­serve the cur­ren­cy data type. How­ev­er, it’s worth not­ing that this data type infor­ma­tion does­n’t auto­mat­i­cal­ly trans­late to for­mat­ting in Excel. We’ll need to apply cur­ren­cy for­mat­ting in Excel after load­ing the data.

Loading the Final Results

Once we’re sat­is­fied with our merged and sum­ma­rized table, we can load it back to Excel. From the Home tab in Pow­er Query, I select “Close and Load To” and choose to cre­ate a new table in the exist­ing worksheet.

The final result is an enhanced invoice table that includes aggre­gat­ed infor­ma­tion from the details table with­out dupli­cat­ing invoice records. This gives us a con­cise view of our data with valu­able insights such as the num­ber of items per invoice, aver­age quan­ti­ties, total val­ues, and sta­tis­ti­cal measures.

This tech­nique is par­tic­u­lar­ly valu­able for finan­cial analy­sis, sales report­ing, and any sit­u­a­tion where you need to com­bine mas­ter records with sum­ma­rized infor­ma­tion from relat­ed detail records. By lever­ag­ing Pow­er Query’s merg­ing and aggre­ga­tion capa­bil­i­ties, we can cre­ate more infor­ma­tive reports while main­tain­ing a clean, non-redun­dant data structure.

How to Create Sales Summaries with Power Query

How to Create Sales Summaries with Power Query

Sales data analy­sis often requires sum­ma­riz­ing results by spe­cif­ic cat­e­gories to gain valu­able busi­ness insights. In this arti­cle, I demon­strate how to effi­cient­ly cal­cu­late sales sum­maries using Pow­er Query’s group­ing func­tion­al­i­ty, pro­vid­ing a pow­er­ful alter­na­tive to tra­di­tion­al Excel for­mu­las for ana­lyz­ing mer­chant per­for­mance, income totals, and prod­uct distribution.

The group­ing tech­nique shown allows for mul­ti­ple aggre­ga­tions in a sin­gle oper­a­tion, dra­mat­i­cal­ly improv­ing effi­cien­cy when work­ing with larg­er datasets.

Watch my detailed tuto­r­i­al video below to fol­low along with each step:

Understanding the Sales Summary Challenge

When ana­lyz­ing sales data, we often need to cre­ate sum­maries that show per­for­mance met­rics grouped by spe­cif­ic cat­e­gories. In this demon­stra­tion, I’m work­ing with a dataset con­tain­ing sales trans­ac­tions that include mer­chants, prod­ucts, dates, and income val­ues. My goal is to sum­ma­rize this data to show sev­er­al key met­rics for each merchant:

  • Count of sales transactions
  • Sum of income
  • Aver­age income per transaction
  • List of unique prod­ucts sold

While these cal­cu­la­tions can be done using Excel for­mu­las (which I’ve pre­pared in the work­sheet for com­par­i­son), using Pow­er Query offers sig­nif­i­cant advan­tages, espe­cial­ly for larg­er datasets or repet­i­tive analy­sis tasks.

Importing Data into Power Query

The first step in our process is to bring the data into Pow­er Query where we can per­form the group­ing operations:

  1. Select the data table in Excel
  2. Nav­i­gate to the Data tab in the Excel ribbon
  3. Click “From Table/Range” to import the data into Pow­er Query

Once the data is loaded into the Pow­er Query Edi­tor, we can see all our sales records and begin the group­ing process to cre­ate our summary.

Creating Groups in Power Query

To sum­ma­rize our data by mer­chant, we need to use the Group By func­tion in Pow­er Query:

  1. Select the Mer­chant col­umn in the Pow­er Query Editor
  2. Go to the Home tab and click “Group By”

The default Group By dia­log offers basic func­tion­al­i­ty with just one col­umn for group­ing and a sin­gle aggre­ga­tion. For our needs, we require the Advanced option since we want to cre­ate mul­ti­ple aggre­ga­tions based on the mer­chant column.

Setting Up Advanced Grouping

In the advanced Group By dia­log, we need to con­fig­ure mul­ti­ple aggregations:

  1. Set the group­ing col­umn to Mer­chant
  2. Add the fol­low­ing aggregations: 
    • “Count Rows” for the num­ber of sales (named “Count”)
    • “Sum” of the Income col­umn (named “Sum”)
    • “Aver­age” of the Income col­umn (named “Aver­age”)
    • “All Rows” to cap­ture all relat­ed data (named “List”)

After click­ing OK, Pow­er Query cre­ates a new table with our mer­chant groups and the spec­i­fied aggre­ga­tions. The Count, Sum, and Aver­age val­ues match the Excel for­mu­las we pre­pared ear­li­er, con­firm­ing that our group­ing is work­ing correctly.

Customizing the Product List

The “All Rows” aggre­ga­tion cre­ates a nest­ed table for each mer­chant con­tain­ing all their sales records. How­ev­er, we only want a list of unique prod­ucts sold by each mer­chant, not all the row data. Since the Group By dia­log does­n’t offer this spe­cif­ic option, we need to mod­i­fy the M for­mu­la directly.

Modifying the M Formula

To cus­tomize our prod­uct list, we need to edit the M code that Pow­er Query generated:

  1. The default “All Rows” aggre­ga­tion pro­duces tables with all columns for each merchant
  2. We need to change this to extract only the Prod­uct col­umn values
  3. Fur­ther­more, we need to elim­i­nate dupli­cates from these lists

The orig­i­nal for­mu­la includes {_} which ref­er­ences the entire row. We need to replace this with a ref­er­ence to just the Prod­uct col­umn: [Prod­uct]. Addi­tion­al­ly, we need to wrap this in the List.Distinct func­tion to remove duplicates:

The mod­i­fied for­mu­la sec­tion should look like: List.Distinct([Product])

Transforming the Product List into Text

Now that we have lists of unique prod­ucts for each mer­chant, we can trans­form these lists into com­ma-sep­a­rat­ed text val­ues for bet­ter readability:

  1. Click on the expand but­ton in the List col­umn header
  2. Select “Extract Val­ues…” rather than “Expand to New Rows”
  3. Choose a cus­tom delim­iter (com­ma fol­lowed by space: “, ”)
  4. Click OK

This trans­forms our prod­uct lists into read­able text strings show­ing all the unique prod­ucts each mer­chant has sold, sep­a­rat­ed by commas.

Loading the Results Back to Excel

Once we’ve cre­at­ed our sum­ma­ry with all the required met­rics, we can load the results back to Excel:

  1. Go to the Home tab and click “Close & Load To…”
  2. Select “Exist­ing Work­sheet” and choose where to place the results
  3. Click OK

After load­ing, we can for­mat the Sum and Aver­age columns as cur­ren­cy to improve read­abil­i­ty. The final result is a clean sum­ma­ry table show­ing the count of sales, total income, aver­age income, and unique prod­ucts for each merchant.

Benefits of Using Power Query for Grouping

While our exam­ple used a rel­a­tive­ly small dataset, the advan­tages of using Pow­er Query for this type of analy­sis become even more appar­ent with larg­er data volumes:

  • Effi­cien­cy — Per­forms cal­cu­la­tions much faster than Excel for­mu­las for large datasets
  • Repeata­bil­i­ty — The query can be refreshed when data changes with­out redo­ing formulas
  • Flex­i­bil­i­ty — Allows for com­plex aggre­ga­tions and trans­for­ma­tions that would be dif­fi­cult with Excel formulas
  • Data clean­ing — Inte­grates with oth­er Pow­er Query trans­for­ma­tions for com­pre­hen­sive data preparation

By mas­ter­ing this group­ing tech­nique in Pow­er Query, you can cre­ate sophis­ti­cat­ed sales sum­maries and oth­er data aggre­ga­tions with just a few clicks, sav­ing sig­nif­i­cant time and effort com­pared to tra­di­tion­al Excel methods.