How to Rank Data Within Groups in Excel: COUNTIFS Method for Analysis

How to Rank Data Within Groups in Excel: COUNTIFS Method for Analysis

Learn­ing how to rank data with­in spe­cif­ic groups is a valu­able Excel skill that can help you orga­nize and ana­lyze infor­ma­tion more effec­tive­ly. By using the COUNTIFS func­tion, you can eas­i­ly deter­mine the rank­ing posi­tion of ele­ments with­in their respec­tive cat­e­gories with­out man­u­al cal­cu­la­tion, sav­ing time and enhanc­ing your data analy­sis capabilities.

This tech­nique is par­tic­u­lar­ly use­ful when work­ing with sales data, per­for­mance met­rics, or any dataset where you need to iden­ti­fy top per­form­ers with­in dif­fer­ent categories.

In my video, I demon­strate how to imple­ment rank­ing with­in groups using a prac­ti­cal example:

Understanding Rank in Groups

When ana­lyz­ing data, we often need to rank items not just over­all but with­in spe­cif­ic cat­e­gories or groups. For exam­ple, you might want to know which sales­per­son per­formed best in each region, or which prod­uct sold most in each store. This is where rank­ing with­in groups becomes essential.

In my demon­stra­tion, I work with a dataset that’s sort­ed by town (our group­ing vari­able) and col­or-cod­ed for eas­i­er visu­al­iza­tion. The data includes dif­fer­ent towns like Moria, Iron­forge, and Azmarin, with each town rep­re­sent­ing a dis­tinct group for our rank­ing purposes.

What makes this tech­nique pow­er­ful is that you can imme­di­ate­ly see how items rank with­in their spe­cif­ic cat­e­gories rather than just see­ing an over­all rank­ing that might be dom­i­nat­ed by one par­tic­u­lar group. This pro­vides much more action­able insights for decision-making.

Using COUNTIFS for Group Ranking

The core of this tech­nique relies on Excel’s COUNT_IFS func­tion, which allows us to count rows where mul­ti­ple cri­te­ria are met simul­ta­ne­ous­ly. This is per­fect for our group rank­ing because we need to check two things at once: whether a row belongs to the same group as our cur­rent row, and whether its val­ue is greater than or equal to our cur­rent row’s value.

The basic for­mu­la struc­ture looks like this:

=COUNTIFS(group_range, current_group, value_range, “>=”&current_value)

Let’s break down how this works in our example:

  • group_range — This is the col­umn con­tain­ing our group­ing vari­able (town names in col­umn A)
  • current_group — This refers to the town in the cur­rent row we’re eval­u­at­ing (A2 for the first data row)
  • value_range — The col­umn con­tain­ing the val­ues we want to rank (rev­enue)
  • current_value — The spe­cif­ic val­ue in the cur­rent row that we’re try­ing to rank

Step-by-Step Implementation

To imple­ment this rank­ing sys­tem in your own spread­sheet, fol­low these steps:

1. First, ensure your data is orga­nized with a clear group­ing vari­able (like town, region, cat­e­go­ry, etc.)

2. Cre­ate a new col­umn for your rank results

3. In this col­umn, enter the COUNTIFS for­mu­la, which will:

  • Check the group col­umn to find rows in the same group as the cur­rent row
  • Check the val­ue col­umn to find val­ues greater than or equal to the cur­rent row’s value
  • Count how many rows meet both cri­te­ria — this count becomes the rank

The for­mu­la uses absolute ref­er­ences (with F4 to cre­ate $ signs) for the ranges and rel­a­tive ref­er­ences for the cur­rent val­ues. This ensures that when you copy the for­mu­la down, it still refers to the cor­rect columns while adapt­ing to each row’s spe­cif­ic values.

Analyzing the Results

After apply­ing the COUNTIFS for­mu­la to our data, we can see that each item gets a rank with­in its group. In our example:

For the town of Moria, we have ranks from 1 to 5

For Iron­forge, ranks range from 1 to 4

For Azmarin, ranks go from 1 to 3

What’s par­tic­u­lar­ly note­wor­thy is that the ranks don’t have to appear in sequence in your spread­sheet. When we sort the entire dataset by rev­enue from largest to small­est, we can see that the high­est-ranked items from each town appear mixed togeth­er. For exam­ple, after the top per­form­ers from Moria, we might see Iron­forge’s top per­former, then more from Moria, then per­haps Azmar­in’s best.

This demon­strates how our rank­ing works inde­pen­dent­ly with­in each group, regard­less of where the items appear in the sort­ed list. The sec­ond-ranked item in Azmarin might appear as the tenth row in our sort­ed data, but it still cor­rect­ly shows as rank 2 with­in its group.

Practical Applications

This rank­ing tech­nique has numer­ous prac­ti­cal applications:

  • Sales analy­sis — Iden­ti­fy top-per­form­ing prod­ucts with­in each category
  • Employ­ee per­for­mance — Rank staff with­in depart­ments or regions
  • Tour­na­ment results — Rank com­peti­tors with­in age groups or divisions
  • Aca­d­e­m­ic per­for­mance — Rank stu­dents with­in class­es or subjects
  • Mar­ket analy­sis — Com­pare per­for­mance of stocks with­in indus­try sectors

By imple­ment­ing this group rank­ing sys­tem, you can quick­ly iden­ti­fy pat­terns that might oth­er­wise be obscured when look­ing at data as a whole. It allows you to make fair com­par­isons with­in rel­e­vant peer groups rather than across dis­sim­i­lar categories.

Additional Tips for Working with Group Rankings

When imple­ment­ing group rank­ings in your spread­sheets, con­sid­er these help­ful tips:

Col­or-cod­ing your data by groups (as shown in my video) makes it much eas­i­er to visu­al­ly iden­ti­fy the dif­fer­ent cat­e­gories and under­stand the rank­ings at a glance.

You can eas­i­ly mod­i­fy the rank­ing log­ic by chang­ing the oper­a­tor in the for­mu­la. For exam­ple, if you want to rank from small­est to largest instead, you would use “<=” instead of “>=” in your COUNTIFS formula.

For ties (when mul­ti­ple items have the same val­ue), this for­mu­la will assign the same rank to all tied items. If you need to han­dle ties dif­fer­ent­ly, you might need to use more com­plex for­mu­las or addi­tion­al columns.

Remem­ber that this tech­nique works with any type of numer­i­cal data you want to rank — sales fig­ures, scores, times, quan­ti­ties, or any oth­er mea­sur­able met­ric — as long as you have a clear group­ing variable.

Remove Top or Bottom Rows Automatically

Remove Top or Bottom Rows Automatically

Dynam­i­cal­ly remov­ing top and bot­tom rows in Pow­er Query can sig­nif­i­cant­ly stream­line your data clean­ing process when deal­ing with incon­sis­tent data imports. This tech­nique allows you to auto­mat­i­cal­ly elim­i­nate unnec­es­sary head­er rows or foot­er infor­ma­tion based on spe­cif­ic con­di­tions rather than fixed row counts, mak­ing your data trans­for­ma­tion process more robust and adapt­able to chang­ing source files.

The abil­i­ty to use con­di­tions rather than sta­t­ic num­bers is par­tic­u­lar­ly valu­able when work­ing with reg­u­lar­ly updat­ed reports that may con­tain vary­ing amounts of meta­da­ta or sum­ma­ry information.

In my video, I demon­strate how to imple­ment this dynam­ic row removal tech­nique that I learned at an Excel Lon­don Meetup:

Understanding the Problem with Static Row Removal

When import­ing data from text files, CSV files, or oth­er sources, you often encounter extra­ne­ous infor­ma­tion at the top and bot­tom of your dataset. These might include title rows, explana­to­ry notes, sum­ma­ry sta­tis­tics, or foot­er infor­ma­tion that aren’t part of the actu­al data you need to ana­lyze. Using Pow­er Query’s stan­dard “Remove Top Rows” or “Remove Bot­tom Rows” func­tions with a fixed num­ber works fine when your source data struc­ture nev­er changes, but becomes prob­lem­at­ic when the num­ber of these rows varies.

In the exam­ple shown in my video, we have mul­ti­ple rows of meta­da­ta before the actu­al head­er row (which con­tains “Date” as the first col­umn val­ue), and sev­er­al rows of addi­tion­al infor­ma­tion at the bot­tom of the data that need to be removed. There are also some miss­ing val­ues through­out the data that make sim­ple fil­ter­ing ineffective.

Dynamically Removing Top Rows Based on Conditions

The con­ven­tion­al approach to remov­ing top rows in Pow­er Query involves spec­i­fy­ing a fixed num­ber. How­ev­er, this can be prob­lem­at­ic when the num­ber of head­er rows changes. The dynam­ic solu­tion involves using a con­di­tion rather than a fixed count.

Here’s how to imple­ment this technique:

  1. Go to the Home tab in Pow­er Query Editor
  2. Select “Remove Rows” and then “Remove Top Rows”
  3. Instead of enter­ing a num­ber, mod­i­fy the for­mu­la to use the “each” key­word fol­lowed by a condition

The for­mu­la will look some­thing like this: each [Column1] <> “Date” (assum­ing “Date” is the head­er text in your first col­umn). This tells Pow­er Query to keep remov­ing rows until it finds a row where the first col­umn con­tains the text “Date”.

The Technical Details of the Table.Skip Function

Behind the scenes, Pow­er Query uses the Table.Skip func­tion when you remove top rows. This func­tion has a hid­den capa­bil­i­ty that isn’t obvi­ous from the user inter­face — it can accept either a count or a con­di­tion parameter.

When using a con­di­tion, the syn­tax changes from sim­ply pro­vid­ing a num­ber to using the format:

each [condition]

The con­di­tion is eval­u­at­ed for each row, start­ing from the top, and rows are removed until the con­di­tion is no longer true. This allows for dynam­ic adap­ta­tion to vary­ing source data structures.

Dynamically Removing Bottom Rows

Sim­i­lar­ly, we can apply the same con­cept to remove rows from the bot­tom of our dataset. This is par­tic­u­lar­ly use­ful when deal­ing with files that con­tain sum­ma­ry infor­ma­tion, notes, or oth­er foot­er data that should be exclud­ed from analysis.

The process for remov­ing bot­tom rows dynam­i­cal­ly is:

  1. Go to the Home tab in Pow­er Query Editor
  2. Select “Remove Rows” and then “Remove Bot­tom Rows”
  3. Replace the sta­t­ic num­ber with a con­di­tion using the “each” keyword

For exam­ple, you might use a for­mu­la like: each [Mer­chant] = “” to remove rows from the bot­tom where the Mer­chant col­umn con­tains an emp­ty text string. Or you might use each [Rev­enue] = null to remove rows where the Rev­enue col­umn con­tains null values.

Handling Different Types of Empty Values

When work­ing with bot­tom rows, it’s impor­tant to under­stand the dif­fer­ent types of emp­ty val­ues that might appear in your data:

  • Emp­ty text strings — rep­re­sent­ed by “” in formulas
  • Null val­ues — rep­re­sent­ed by null in formulas
  • Miss­ing val­ues — which might be null or emp­ty depend­ing on the data source

In the video demon­stra­tion, I show how to han­dle both emp­ty text strings and null val­ues as con­di­tions for remov­ing bot­tom rows. The key is to iden­ti­fy which col­umn and which type of emp­ty val­ue reli­ably indi­cates the foot­er sec­tion of your data.

Practical Applications and Benefits

This dynam­ic row removal tech­nique is par­tic­u­lar­ly valu­able in sev­er­al scenarios:

When work­ing with reg­u­lar­ly updat­ed reports where the struc­ture might change slight­ly between ver­sions, this approach ensures your Pow­er Query solu­tion remains robust. It’s also help­ful when deal­ing with data exports from sys­tems that include vary­ing amounts of meta­da­ta or when con­sol­i­dat­ing mul­ti­ple files that might have dif­fer­ent head­er structures.

The major ben­e­fits include:

  • More resilient data trans­for­ma­tion process­es that don’t break when source for­mats change slightly
  • Reduced need for man­u­al inter­ven­tion when pro­cess­ing new data
  • Abil­i­ty to han­dle files with incon­sis­tent struc­ture automatically
  • Greater flex­i­bil­i­ty com­pared to sta­t­ic row removal or sim­ple filtering

This tech­nique demon­strates the pow­er of M lan­guage in Pow­er Query, allow­ing for solu­tions that go beyond what’s imme­di­ate­ly avail­able in the user inter­face. By under­stand­ing and lever­ag­ing these more advanced capa­bil­i­ties, you can cre­ate more robust data trans­for­ma­tion processes.

Important Considerations

When imple­ment­ing this tech­nique, keep in mind a few impor­tant points:

The con­di­tion you use must reli­ably iden­ti­fy the bound­ary between the rows you want to keep and those you want to remove. Choose col­umn val­ues that are con­sis­tent­ly present (or con­sis­tent­ly absent) at these bound­aries. Also be aware that if your con­di­tion nev­er eval­u­ates to false, you could poten­tial­ly remove all rows from your dataset, so test­ing with rep­re­sen­ta­tive sam­ple data is essential.

Addi­tion­al­ly, remem­ber that this tech­nique works even when you have miss­ing val­ues in your actu­al data. As shown in the video, the rows are only removed when they match the spe­cif­ic con­di­tion you’ve defined, allow­ing rows with some miss­ing val­ues to be retained as long as they don’t match your removal condition.

Excel Pivot Table Hack: Display Percentages of Parent Row Totals

Excel Pivot Table Hack: Display Percentages of Parent Row Totals

Piv­ot tables offer a pow­er­ful way to ana­lyze data, par­tic­u­lar­ly when you need to under­stand pro­por­tions with­in hier­ar­chi­cal cat­e­gories. In this tuto­r­i­al, I’ll show you how to quick­ly add sums in a piv­ot table and dis­play val­ues as a per­cent­age of their par­ent row, allow­ing for imme­di­ate visu­al­iza­tion of how indi­vid­ual items con­tribute to their cat­e­go­ry totals.

This tech­nique is espe­cial­ly valu­able when ana­lyz­ing sales data across prod­uct cat­e­gories and indi­vid­ual items.

Watch my step-by-step video tuto­r­i­al below to see this process in action:

Setting Up Your Pivot Table

To begin cre­at­ing an infor­ma­tive piv­ot table with per­cent­age cal­cu­la­tions, we need to start with a sim­ple dataset that con­tains hier­ar­chi­cal infor­ma­tion. In my exam­ple, I’m using a dataset that includes cat­e­gories (such as fruits, veg­eta­bles, and sweets) along with the spe­cif­ic prod­ucts with­in each cat­e­go­ry and their cor­re­spond­ing rev­enue figures.

The process of cre­at­ing the piv­ot table is straightforward:

  1. Select your data range
  2. Nav­i­gate to the Insert tab in the Excel ribbon
  3. Click on “Piv­ot Table”
  4. Choose to place the piv­ot table on an exist­ing work­sheet (I select­ed cell F1 in my demonstration)
  5. Click “OK” to cre­ate the basic piv­ot table structure

Once your piv­ot table frame­work is estab­lished, you’ll need to struc­ture it prop­er­ly to show both cat­e­gories and their con­stituent prod­ucts. In the Piv­ot­Table Fields pan­el, drag the appro­pri­ate fields to build your hier­ar­chi­cal view.

Structuring Your Pivot Table

For prop­er hier­ar­chi­cal analy­sis, you’ll want to arrange your fields in a log­i­cal order. In the Rows sec­tion of the Piv­ot­Table Fields pan­el, add your Cat­e­go­ry field first, fol­lowed by the Prod­uct field. This cre­ates a nest­ed struc­ture where prod­ucts appear under their respec­tive categories.

For the val­ues sec­tion, we need to add the Rev­enue field twice — once to show the raw sum and once to show the per­cent­age of par­ent. Sim­ply drag the Rev­enue field to the Val­ues area twice. By default, Excel will sum these val­ues, which is exact­ly what we want for this analysis.

Changing the Layout

By default, Excel dis­plays piv­ot tables in com­pact form, but for bet­ter read­abil­i­ty, I pre­fer the tab­u­lar lay­out. To change this:

  1. Go to the Design tab under Piv­ot­Table Tools
  2. Click on “Report Layout”
  3. Select “Show in Tab­u­lar Form”

This adjust­ment sep­a­rates the Cat­e­go­ry and Prod­uct into dis­tinct columns, mak­ing your data more read­able and eas­i­er to ana­lyze at a glance.

Adding Percentage of Parent Row

Now comes the key part — trans­form­ing one of our rev­enue columns to show per­cent­age of par­ent row. This cal­cu­la­tion will show how each prod­uct con­tributes pro­por­tion­al­ly to its cat­e­go­ry total, and how each cat­e­go­ry con­tributes to the grand total. Fol­low these steps:

  1. Right-click on any cell with­in the sec­ond Sum of Rev­enue column
  2. Select “Show Val­ues As” from the con­text menu
  3. Choose “% of Par­ent Row Total”

This sim­ple change trans­forms the raw num­bers into per­cent­ages, giv­ing you imme­di­ate insight into the pro­por­tion­al con­tri­bu­tion of each item. For instance, in my exam­ple, you can now see that apples rep­re­sent approx­i­mate­ly 35% of all fruit sales, while the fruits cat­e­go­ry as a whole rep­re­sents about 41% of total sales across all categories.

Understanding the Results

After apply­ing the per­cent­age of par­ent row cal­cu­la­tion, your piv­ot table auto­mat­i­cal­ly adjusts to show mean­ing­ful pro­por­tions at every level:

  • Indi­vid­ual prod­ucts show their per­cent­age con­tri­bu­tion to their imme­di­ate category
  • Cat­e­go­ry subto­tals show their per­cent­age con­tri­bu­tion to the grand total
  • The grand total always equals 100%

In my demon­stra­tion, this clear­ly showed that sweets account­ed for approx­i­mate­ly 40% of total sales, veg­eta­bles for about 18%, and fruits for approx­i­mate­ly 41%. With­in each cat­e­go­ry, you can sim­i­lar­ly see the pro­por­tion­al con­tri­bu­tion of each product.

Finalizing Your Pivot Table

To make your piv­ot table more under­stand­able, it’s impor­tant to rename the col­umn head­ers to accu­rate­ly reflect what each col­umn rep­re­sents. In our case:

  1. Change the head­ing of the first sum col­umn to sim­ply read “Sum”
  2. Rename the sec­ond col­umn to “Per­cent­age of Parent”

These descrip­tive head­ers ensure that any­one view­ing your piv­ot table will imme­di­ate­ly under­stand what the num­bers rep­re­sent with­out need­ing addi­tion­al explanation.

With these adjust­ments com­plete, you now have a com­pre­hen­sive piv­ot table that not only shows the raw rev­enue fig­ures but also clear­ly illus­trates the pro­por­tion­al rela­tion­ships between cat­e­gories and their con­stituent prod­ucts. This dual-view approach pro­vides both absolute and rel­a­tive per­spec­tives on your data, enabling more nuanced analy­sis and decision-making.

This tech­nique is par­tic­u­lar­ly valu­able when ana­lyz­ing sales per­for­mance, bud­get allo­ca­tions, or any hier­ar­chi­cal data where under­stand­ing pro­por­tion­al rela­tion­ships is impor­tant. By visu­al­iz­ing both raw num­bers and per­cent­ages simul­ta­ne­ous­ly, you gain deep­er insights into your data struc­ture and can more effec­tive­ly com­mu­ni­cate those insights to others.

How to Calculate Hierarchy Levels in Power Query

How to Calculate Hierarchy Levels in Power Query

In this tuto­r­i­al, I demon­strate a prac­ti­cal appli­ca­tion of recur­sive func­tions in Pow­er Query to cal­cu­late hier­ar­chy lev­els in orga­ni­za­tion­al struc­tures or MLM sys­tems. Using a cus­tom func­tion with recur­sion, we can effi­cient­ly deter­mine each per­son­’s posi­tion in a hier­ar­chi­cal struc­ture based on their refer­rer rela­tion­ships, pro­vid­ing valu­able insights for orga­ni­za­tion­al analy­sis and reporting.

Under­stand­ing hier­ar­chy lev­els is essen­tial for visu­al­iz­ing report­ing struc­tures, track­ing MLM down­lines, or map­ping any par­ent-child rela­tion­ships in your data.

Understanding the Problem: Hierarchy Levels in Organizations

In many orga­ni­za­tion­al struc­tures, par­tic­u­lar­ly in mul­ti-lev­el mar­ket­ing (MLM) sys­tems or cor­po­rate hier­ar­chies, under­stand­ing the lev­el depth of each mem­ber is cru­cial. The lev­el rep­re­sents how many steps a per­son is from the top of the orga­ni­za­tion. For instance, in our exam­ple, John is at the top (lev­el 0), N is direct­ly below John (lev­el 1), and Thomas is below N (lev­el 2).

Our sam­ple data con­tains an ID col­umn that unique­ly iden­ti­fies each per­son and a Refer­rer col­umn that indi­cates who brought that per­son into the orga­ni­za­tion. The refer­ral rela­tion­ship estab­lish­es the hier­ar­chi­cal struc­ture we need to ana­lyze. Our goal is to cal­cu­late each per­son­’s hier­ar­chy lev­el auto­mat­i­cal­ly using Pow­er Query’s recur­sive capabilities.

Setting Up Power Query

To begin work­ing with our data, we need to import it into Pow­er Query where we can cre­ate and apply our recur­sive function:

  1. Select any cell in your data table
  2. Go to the Data tab
  3. Click From Table/Range to import your data into Pow­er Query

This imports your data con­tain­ing the ID and Refer­rer columns into the Pow­er Query Edi­tor, where we can start build­ing our solution.

Creating a Recursive Function to Calculate Hierarchy Levels

The core of our solu­tion is a cus­tom func­tion that can call itself (recur­sion) to track up through the hier­ar­chy until it reach­es the top. Here’s how to cre­ate it:

  1. In the Pow­er Query Edi­tor, go to the Home tab
  2. Click New Source > Oth­er Sources > Blank Query

Now we need to define our func­tion. Our func­tion will require two para­me­ters: the per­son­’s ID we want to cal­cu­late the lev­el for and the com­plete table of peo­ple data to ref­er­ence. The func­tion will:

  1. Find the row for the cur­rent person
  2. Get their refer­rer’s ID
  3. Check if they have a referrer
  4. If they don’t (they’re at the top), return 0
  5. If they do have a refer­rer, call the same func­tion for the refer­rer and add 1 to the result

The M code for our func­tion looks like this:

(per­son­ID, personTable) =>
let
per­son­Row = Table.SelectRows(personTable, each [ID] = personID){0},
per­son­AboveID = personRow[Referrer],
result = if per­son­AboveID = null
then 0
else @HierarchyLevel(personAboveID, per­son­Table) + 1
in
result

Make sure to name your func­tion Hier­ar­chyLev­el so that the recur­sive ref­er­ence to itself works prop­er­ly. The @ sym­bol in front of the func­tion name empha­sizes that we’re call­ing the same func­tion again.

Applying the Function to Our Data

After cre­at­ing our func­tion, we need to apply it to every row in our data table:

  1. Go back to your main query with the import­ed data
  2. Click Add Col­umn > Invoke Cus­tom Function
  3. Select your Hier­ar­chyLev­el function
  4. For per­son­ID, select the ID column
  5. For per­son­Table, we need to ref­er­ence the cur­rent table

When set­ting the per­son­Table para­me­ter, we ini­tial­ly might try to ref­er­ence a col­umn name, but this will gen­er­ate errors. Instead, we need to ref­er­ence the entire table from the pre­vi­ous step. In Pow­er Query, we can do this by refer­ring to the pre­vi­ous step name.

Once cor­rect­ly con­fig­ured, the func­tion will cal­cu­late the hier­ar­chy lev­el for each per­son in our table. Thomas, who is referred by N (ID 2) who in turn is referred by John (ID 1), will show as lev­el 2. John, who has no refer­rer, will be at lev­el 0.

Testing and Validating the Recursion

To ver­i­fy our func­tion works cor­rect­ly, we can exam­ine the cal­cu­lat­ed lev­els for each per­son in our organization:

  • John (ID 1): Lev­el 0 (top of hier­ar­chy, no referrer)
  • Anne (ID 2): Lev­el 1 (referred by John)
  • Thomas (ID 5): Lev­el 2 (referred by Anne)
  • Paul: Lev­el 3 (referred by Thomas)

We can fur­ther test by chang­ing refer­rer rela­tion­ships. For exam­ple, if we change Paul’s refer­rer from Thomas (ID 5) to some­one who is already at lev­el 3, Paul would then become lev­el 4. After mak­ing such changes in the source data, we can sim­ply refresh our Pow­er Query to see the updat­ed hier­ar­chy levels.

Handling Edge Cases in Your Hierarchy

Our recur­sive func­tion han­dles sev­er­al impor­tant edge cases:

  1. Top-lev­el mem­bers (those with null refer­rers) are assigned lev­el 0
  2. The func­tion will work for orga­ni­za­tions of any depth, con­tin­u­ing to recurse up the chain until it reach­es the top
  3. If the data con­tains cir­cu­lar ref­er­ences (Per­son A refers to Per­son B who refers back to Per­son A), the recur­sion could cre­ate an infi­nite loop — con­sid­er adding error han­dling for this sce­nario in real applications

In a real-world sce­nario, you might want to enhance this func­tion to han­dle more com­plex require­ments, such as detect­ing cir­cu­lar ref­er­ences or pro­cess­ing mul­ti­ple hier­ar­chies with­in the same dataset.

Loading the Results Back to Excel

Once you’re sat­is­fied with your hier­ar­chy lev­el calculations:

  1. In the Pow­er Query Edi­tor, go to Home > Close & Load
  2. Your data table with the new hier­ar­chy lev­el col­umn will appear in Excel
  3. Any time your hier­ar­chy changes, sim­ply refresh the query to recal­cu­late all levels

This pow­er­ful tech­nique allows you to main­tain an up-to-date view of your orga­ni­za­tion­al struc­ture with min­i­mal effort. The recur­sive approach effi­cient­ly han­dles even large hier­ar­chies with­out requir­ing com­plex for­mu­las or man­u­al tracking.

With this solu­tion in place, you can eas­i­ly build reports and visu­al­iza­tions that lever­age hier­ar­chy lev­el infor­ma­tion, enabling bet­ter insights into your orga­ni­za­tion­al struc­ture, MLM down­lines, or any hier­ar­chi­cal data you’re work­ing with.

How to Split Text by Position in Power Query

How to Split Text by Position in Power Query

Split­ting text by length in Pow­er Query can trans­form clut­tered data into orga­nized, usable infor­ma­tion with­out com­plex for­mu­las. This tech­nique allows you to break down text strings into sep­a­rate columns based on spe­cif­ic char­ac­ter posi­tions, mak­ing it espe­cial­ly use­ful when deal­ing with fixed-width data for­mats that con­tain mul­ti­ple pieces of information.

The real pow­er of this approach lies in its abil­i­ty to han­dle irreg­u­lar split­ting require­ments where each sec­tion has a dif­fer­ent length.

In my video, I demon­strate the com­plete process of split­ting text by length in Pow­er Query:

Understanding the Data Structure

When work­ing with text data that needs to be split, it’s essen­tial to first ana­lyze the struc­ture of your text. In the demon­stra­tion, I work with a dataset where each text string con­tains sev­er­al pieces of infor­ma­tion with vary­ing lengths:

  • Per­son infor­ma­tion (30 characters)
  • Delim­iter char­ac­ters (semi­colons, pipes) that need to be removed
  • Date infor­ma­tion (appear­ing as numbers)
  • Cur­ren­cy val­ues in dif­fer­ent formats

The chal­lenge lies in the fact that each sec­tion has a dif­fer­ent length, mak­ing stan­dard split func­tions less effec­tive. This is pre­cise­ly where Pow­er Query’s split by posi­tion fea­ture becomes invaluable.

Importing Data to Power Query

The first step in the process is to import your data into Pow­er Query. This can be done eas­i­ly by select­ing your data table and using the From Table/Range option in the Data tab. Once your data is in Pow­er Query, you’ll have access to pow­er­ful trans­for­ma­tion tools that aren’t avail­able in stan­dard Excel.

Pow­er Query pro­vides a user-friend­ly inter­face where you can see your data and apply var­i­ous trans­for­ma­tions step by step. This visu­al approach makes it eas­i­er to track changes and ensure that your data is being processed correctly.

Splitting Text by Position

With the data import­ed into Pow­er Query, we can now split the text col­umn based on spe­cif­ic posi­tions. Here’s how to do it:

  1. Go to the Home tab in Pow­er Query
  2. Select Split Col­umn and choose “By Posi­tions” (not “By Num­ber of Characters”)
  3. Enter the spe­cif­ic posi­tions where you want to split the text

In the exam­ple, I need­ed to split at posi­tions 0, 30, 31, 39, and 41. It’s impor­tant to note that Pow­er Query counts from zero for the first posi­tion, not one. These num­bers rep­re­sent the start­ing points for each sec­tion of text.

After press­ing OK, Pow­er Query cre­ates new columns based on these posi­tion splits. The result is five sep­a­rate columns, each con­tain­ing a dis­tinct part of the orig­i­nal text string.

Refining the Split Data

Renaming and Removing Columns

After split­ting the text, we need to orga­nize our data by giv­ing mean­ing­ful names to the impor­tant columns and remov­ing unnec­es­sary ones. In the for­mu­la bar, I renamed the columns to reflect their content:

  • “Text.1” became “Per­son”
  • “Text.3” became “Date”
  • “Text.5” became “Pay­out”

For columns con­tain­ing delim­iters or oth­er unwant­ed infor­ma­tion (in this case, “Text.2” and “Text.4”), we can sim­ply delete them by select­ing them with Ctrl+click and press­ing the Delete key.

Correcting Data Types

Once we have our prop­er­ly named columns, we need to ensure that each col­umn has the cor­rect data type. In the demonstration:

  • The “Date” col­umn was ini­tial­ly rec­og­nized as an inte­ger and need­ed to be con­vert­ed to the date data type
  • The “Pay­out” col­umn con­tained cur­ren­cy val­ues in dif­fer­ent for­mats that need­ed prop­er interpretation

Pow­er Query can auto­mat­i­cal­ly detect and con­vert data types, but some­times man­u­al inter­ven­tion is nec­es­sary. By click­ing on the data type icon in the col­umn head­er, you can force Pow­er Query to inter­pret the data as a spe­cif­ic type.

Working with Regional Settings

An impor­tant aspect of work­ing with dates and cur­ren­cy val­ues is under­stand­ing how region­al set­tings affect data inter­pre­ta­tion. In Pow­er Query, you can adjust these set­tings to match the for­mat of your data.

To access these set­tings, go to:

  1. Options and Settings
  2. Query Options
  3. Cur­rent Workbook
  4. Region­al Settings

In my demon­stra­tion, the region­al set­tings were set to “Eng­lish (Unit­ed King­dom)” which allowed Pow­er Query to cor­rect­ly inter­pret the pound (£) cur­ren­cy sym­bols regard­less of their posi­tion in the text. If your data uses dif­fer­ent region­al for­mats, you can adjust these set­tings to match your needs.

For indi­vid­ual columns, you can also click on the data type icon and select “Using Locale” to spec­i­fy both the data type and region­al for­mat for that par­tic­u­lar col­umn. This gives you fine-grained con­trol over how Pow­er Query inter­prets your data.

Finalizing the Transformation

After com­plet­ing all the nec­es­sary trans­for­ma­tions — split­ting the text, renam­ing columns, remov­ing unnec­es­sary columns, and set­ting the cor­rect data types — the final step is to load the trans­formed data back into Excel.

From the Home tab, select “Close & Load” to export your prop­er­ly orga­nized table back to Excel. The result is a clean, struc­tured dataset with sep­a­rate columns for per­son infor­ma­tion, dates, and pay­ment val­ues, all with the appro­pri­ate data types.

This tech­nique of split­ting text by posi­tion in Pow­er Query is par­tic­u­lar­ly use­ful when deal­ing with fixed-width data exports from lega­cy sys­tems, stan­dard­ized report out­puts, or any sit­u­a­tion where text strings con­tain mul­ti­ple data points at known posi­tions. By mas­ter­ing this approach, you can quick­ly trans­form dense, com­bined text fields into orga­nized and usable data.

How to Import Fixed-Width Files to Excel

How to Import Fixed-Width Files to Excel

Import­ing fixed-width col­umn data to Excel can be chal­leng­ing when each col­umn has a set num­ber of char­ac­ters, often result­ing in unnec­es­sary spaces. In this arti­cle, I’ll show you how to prop­er­ly import fixed-width CSV files using Pow­er Query, ensur­ing clean data with­out those prob­lem­at­ic extra spaces.

This tech­nique is essen­tial for han­dling struc­tured data exports from lega­cy sys­tems or stan­dard­ized reports.

Understanding Fixed-Width Column Data

Fixed-width col­umn data is a spe­cif­ic for­mat where each col­umn occu­pies exact­ly the same num­ber of char­ac­ters in every row. Unlike delim­it­ed for­mats (like com­ma-sep­a­rat­ed val­ues), fixed-width files allo­cate a pre­de­ter­mined num­ber of spaces for each field. This means short­er val­ues get padded with spaces to main­tain the fixed width. For exam­ple, the date col­umn might always be 10 char­ac­ters wide, the mer­chant col­umn 20 char­ac­ters, and so on.

This for­mat is com­mon in old­er sys­tems, gov­ern­ment data, and stan­dard­ized reports, but it presents unique chal­lenges when import­ing to Excel. The main issue is that these extra spaces can cause prob­lems with data analy­sis and visu­al­iza­tion if not prop­er­ly han­dled dur­ing the import process.

Importing Fixed-Width Data Using Power Query

Pow­er Query pro­vides excel­lent tools for han­dling fixed-width col­umn data. Here’s the process to prop­er­ly import this type of data:

Initial Import Steps

To begin import­ing fixed-width data, fol­low these steps:

  1. Go to the Data tab in Excel
  2. Click on “From Text/CSV” option
  3. Locate and select your fixed-width file

When the Pow­er Query Edi­tor opens, you might need to adjust the win­dow size to see all avail­able options, as some­times the third col­umn of set­tings may be hid­den if the win­dow is too nar­row. Make sure to expand the win­dow to see all import settings.

Configuring Import Settings

Dur­ing the import process, you’ll need to con­fig­ure sev­er­al impor­tant settings:

First, check the file ori­gin (encod­ing) — UTF‑8 works well for most mod­ern files. Then, ver­i­fy that the delim­iter is set cor­rect­ly. For fixed-width files, Pow­er Query typ­i­cal­ly auto­mat­i­cal­ly detects this for­mat. You should see “Fixed Width” select­ed rather than com­ma, tab, or oth­er delimiters.

The data type detec­tion set­ting is anoth­er cru­cial choice. You have three options:

  • Based on first 200 rows
  • Based on entire dataset
  • Do not detect data types

In most cas­es with fixed-width data, it’s bet­ter to select “Do not detect data types” ini­tial­ly and han­dle data type con­ver­sion after clean­ing the spaces. This pre­vents Pow­er Query from mak­ing assump­tions about your data types that might be affect­ed by the extra spaces.

Handling Extra Spaces in Power Query

The key chal­lenge with fixed-width data is man­ag­ing the extra spaces that pad each col­umn. When exam­in­ing the data in Pow­er Query, you’ll notice these extra spaces in both the col­umn head­ers and text fields. These spaces are nec­es­sary in the orig­i­nal file to main­tain the fixed-width for­mat, but they’re prob­lem­at­ic for analy­sis in Excel.

For exam­ple, when click­ing on a col­umn head­er, you might see some­thing like “Date ” with sev­er­al spaces after the actu­al name. Sim­i­lar­ly, text fields like prod­uct names or mer­chant names will have trail­ing spaces to fill their allo­cat­ed width.

Removing Extra Spaces

To clean up the data properly:

  1. Select all columns in the Pow­er Query Edi­tor by click­ing the first col­umn head­er and then Shift-click­ing the last column
  2. Go to the Trans­form tab
  3. Click on “For­mat” and select “Trim”

This impor­tant step removes all lead­ing and trail­ing white spaces from every cell in your dataset while pre­serv­ing spaces that are actu­al­ly part of your data (like spaces between words in prod­uct names). After trim­ming the data, you’ll notice a sig­nif­i­cant improve­ment in how your data looks.

Setting Headers and Data Types

After remov­ing the extra spaces, you can pro­ceed with con­fig­ur­ing head­ers and data types:

Use the “Use First Row as Head­ers” option (found in the Trans­form tab or by right-click­ing on the first row) to set your head­ers prop­er­ly. Now that the spaces have been removed, your col­umn names will appear clean and professional.

With clean data, you can now set appro­pri­ate data types for each col­umn. For exam­ple, date columns should be for­mat­ted as dates, and rev­enue or cost columns as cur­ren­cy or dec­i­mal num­bers. Pow­er Query will help iden­ti­fy appro­pri­ate types, but you can man­u­al­ly change them by click­ing the data type icon in each col­umn header.

Loading Data to Excel

The final step is to load your cleaned data into Excel:

  1. Click “Close & Load” (or “Close & Load To” for more options)
  2. Select whether to load as a table or to a spe­cif­ic location
  3. Choose an exist­ing work­sheet or cre­ate a new one

When load­ing the data, you have the option to add it to the data mod­el if you plan to use it with Pow­er Piv­ot, but this isn’t nec­es­sary for basic Excel analy­sis. One impor­tant thing to note is that while Pow­er Query may detect cur­ren­cy data types, this does­n’t auto­mat­i­cal­ly apply Excel’s cur­ren­cy for­mat­ting to your table. You’ll need to apply appro­pri­ate num­ber for­mat­ting in Excel after importing.

For­tu­nate­ly, when you refresh your data in the future, Excel will main­tain any num­ber for­mat­ting you’ve applied to the table, mak­ing future updates seamless.

Benefits of Using Power Query for Fixed-Width Data

Using Pow­er Query for import­ing fixed-width data offers sev­er­al advan­tages over oth­er methods:

The abil­i­ty to trans­form data before load­ing it to Excel saves sig­nif­i­cant time com­pared to man­u­al cleanup after import. The process is also repeat­able — if you receive updat­ed ver­sions of the same file, you can sim­ply refresh your query rather than repeat­ing the entire import and cleanup process.

Addi­tion­al­ly, Pow­er Query main­tains a record of all trans­for­ma­tions applied, mak­ing your data prepa­ra­tion process trans­par­ent and eas­i­ly adjustable if need­ed. This approach ensures con­sis­ten­cy in how your fixed-width data is processed each time.

Merge Tables with Summarized Data for Cleaner Reports

Merge Tables with Summarized Data for Cleaner Reports

Merg­ing tables in Pow­er Query with sum­ma­rized data pro­vides a pow­er­ful way to con­sol­i­date infor­ma­tion with­out over­whelm­ing detail. By aggre­gat­ing invoice details such as counts, sums, aver­ages, and medi­ans, you can cre­ate con­cise reports that high­light key insights while main­tain­ing the orig­i­nal table structure.

This tech­nique is par­tic­u­lar­ly valu­able when deal­ing with sales data where each invoice con­tains mul­ti­ple line items that need to be condensed.

In my video tuto­r­i­al, I demon­strate how to effec­tive­ly merge tables with sum­ma­riza­tion in Pow­er Query:

Understanding the Problem: Merging Tables with Summarization

When work­ing with rela­tion­al data like invoic­es and their detailed items, we often need to com­bine infor­ma­tion from mul­ti­ple tables. How­ev­er, sim­ply merg­ing tables can result in redun­dant data and unnec­es­sary com­plex­i­ty. The chal­lenge is to merge data while simul­ta­ne­ous­ly sum­ma­riz­ing the details to pro­vide mean­ing­ful insights.

In this demon­stra­tion, I work with two tables: one con­tain­ing basic invoice infor­ma­tion and anoth­er with detailed line items for each invoice. The goal is to enhance the invoice table with sum­ma­rized infor­ma­tion from the details table, such as the count of items, aver­age quan­ti­ties, sum of prices, and even sta­tis­ti­cal mea­sures like medians.

Loading Tables into Power Query

The first step in our process is to load both tables into Pow­er Query for manip­u­la­tion. This gives us access to the pow­er­ful data trans­for­ma­tion capa­bil­i­ties that allow for sophis­ti­cat­ed merg­ing and summarization.

To begin, I nav­i­gate to the Data tab in Excel and select “From Table/Range” to import our first table into Pow­er Query. For this ini­tial details table, I choose to load it only as a con­nec­tion rather than import­ing it direct­ly into the work­sheet. This is done by select­ing “Close and Load To” and then choos­ing the “Con­nec­tion only” option.

After load­ing the details table, I fol­low the same process to load the invoice table. This time, how­ev­er, we’ll per­form our merge oper­a­tion direct­ly with­in this query rather than cre­at­ing a new one.

Merging the Tables

With both tables loaded into Pow­er Query, we can now merge them using the com­mon invoice ID field. This is where the pow­er of our approach becomes evident.

To per­form the merge:

  1. Select the “Merge Queries” com­mand from the ribbon
  2. Choose the sec­ond table (details) to merge with
  3. Select match­ing columns for merg­ing (invoice ID in both tables)
  4. Choose “Left Out­er Join” as the join kind to ensure all invoic­es are included

After press­ing OK, Pow­er Query cre­ates a new col­umn in our invoice table that con­tains the match­ing rows from the details table. You’ll notice that for each invoice, we now have a nest­ed table show­ing all the cor­re­spond­ing detail rows. For exam­ple, some invoic­es might have two detail rows, oth­ers three, and some even five or more.

The Key Difference: Aggregating Instead of Expanding

The crit­i­cal step that dif­fer­en­ti­ates this approach from a stan­dard merge is what we do next. Instead of expand­ing these nest­ed tables (which would cre­ate dupli­cate invoice rows), we want to aggre­gate the infor­ma­tion they contain.

When we click on the expand but­ton (the small tri­an­gle) in the col­umn head­er, we get options to expand or aggre­gate. By choos­ing to aggre­gate, we can per­form cal­cu­la­tions across all the detail rows for each invoice.

Creating Summarized Metrics

The pow­er of this tech­nique lies in the vari­ety of aggre­ga­tions we can per­form on the detailed data. For our invoice exam­ple, I demon­strate sev­er­al use­ful metrics:

  • Count of items: Sim­ple count of how many line items exist for each invoice
  • Aver­age quan­ti­ty: Instead of sum­ming quan­ti­ties, we cal­cu­late the aver­age num­ber of items per line
  • Sum of unit price: Total val­ue of the invoice
  • Medi­an of unit price: Sta­tis­ti­cal mea­sure show­ing the mid­dle val­ue of all unit prices on the invoice

To cre­ate these aggre­ga­tions, I select the appro­pri­ate options from the drop­down menu when expand­ing the nest­ed table col­umn. By uncheck­ing “Use orig­i­nal col­umn name as pre­fix,” we get clean­er col­umn names in our results.

Refining the Results

After cre­at­ing our aggre­gat­ed columns, we may want to refine the col­umn names for clar­i­ty. This can be done direct­ly in the for­mu­la bar in Pow­er Query. For exam­ple, I rename one of the columns to “Count” to bet­ter reflect its purpose.

It’s also impor­tant to check the data types of our new columns. Pow­er Query auto­mat­i­cal­ly assigns data types, but they may need adjust­ment. In our case, the “Count” col­umn is cor­rect­ly set as a dec­i­mal num­ber, which works well since counts will always be whole numbers.

When work­ing with mon­e­tary val­ues like the sum of unit prices, Pow­er Query may pre­serve the cur­ren­cy data type. How­ev­er, it’s worth not­ing that this data type infor­ma­tion does­n’t auto­mat­i­cal­ly trans­late to for­mat­ting in Excel. We’ll need to apply cur­ren­cy for­mat­ting in Excel after load­ing the data.

Loading the Final Results

Once we’re sat­is­fied with our merged and sum­ma­rized table, we can load it back to Excel. From the Home tab in Pow­er Query, I select “Close and Load To” and choose to cre­ate a new table in the exist­ing worksheet.

The final result is an enhanced invoice table that includes aggre­gat­ed infor­ma­tion from the details table with­out dupli­cat­ing invoice records. This gives us a con­cise view of our data with valu­able insights such as the num­ber of items per invoice, aver­age quan­ti­ties, total val­ues, and sta­tis­ti­cal measures.

This tech­nique is par­tic­u­lar­ly valu­able for finan­cial analy­sis, sales report­ing, and any sit­u­a­tion where you need to com­bine mas­ter records with sum­ma­rized infor­ma­tion from relat­ed detail records. By lever­ag­ing Pow­er Query’s merg­ing and aggre­ga­tion capa­bil­i­ties, we can cre­ate more infor­ma­tive reports while main­tain­ing a clean, non-redun­dant data structure.

How to Create Sales Summaries with Power Query

How to Create Sales Summaries with Power Query

Sales data analy­sis often requires sum­ma­riz­ing results by spe­cif­ic cat­e­gories to gain valu­able busi­ness insights. In this arti­cle, I demon­strate how to effi­cient­ly cal­cu­late sales sum­maries using Pow­er Query’s group­ing func­tion­al­i­ty, pro­vid­ing a pow­er­ful alter­na­tive to tra­di­tion­al Excel for­mu­las for ana­lyz­ing mer­chant per­for­mance, income totals, and prod­uct distribution.

The group­ing tech­nique shown allows for mul­ti­ple aggre­ga­tions in a sin­gle oper­a­tion, dra­mat­i­cal­ly improv­ing effi­cien­cy when work­ing with larg­er datasets.

Watch my detailed tuto­r­i­al video below to fol­low along with each step:

Understanding the Sales Summary Challenge

When ana­lyz­ing sales data, we often need to cre­ate sum­maries that show per­for­mance met­rics grouped by spe­cif­ic cat­e­gories. In this demon­stra­tion, I’m work­ing with a dataset con­tain­ing sales trans­ac­tions that include mer­chants, prod­ucts, dates, and income val­ues. My goal is to sum­ma­rize this data to show sev­er­al key met­rics for each merchant:

  • Count of sales transactions
  • Sum of income
  • Aver­age income per transaction
  • List of unique prod­ucts sold

While these cal­cu­la­tions can be done using Excel for­mu­las (which I’ve pre­pared in the work­sheet for com­par­i­son), using Pow­er Query offers sig­nif­i­cant advan­tages, espe­cial­ly for larg­er datasets or repet­i­tive analy­sis tasks.

Importing Data into Power Query

The first step in our process is to bring the data into Pow­er Query where we can per­form the group­ing operations:

  1. Select the data table in Excel
  2. Nav­i­gate to the Data tab in the Excel ribbon
  3. Click “From Table/Range” to import the data into Pow­er Query

Once the data is loaded into the Pow­er Query Edi­tor, we can see all our sales records and begin the group­ing process to cre­ate our summary.

Creating Groups in Power Query

To sum­ma­rize our data by mer­chant, we need to use the Group By func­tion in Pow­er Query:

  1. Select the Mer­chant col­umn in the Pow­er Query Editor
  2. Go to the Home tab and click “Group By”

The default Group By dia­log offers basic func­tion­al­i­ty with just one col­umn for group­ing and a sin­gle aggre­ga­tion. For our needs, we require the Advanced option since we want to cre­ate mul­ti­ple aggre­ga­tions based on the mer­chant column.

Setting Up Advanced Grouping

In the advanced Group By dia­log, we need to con­fig­ure mul­ti­ple aggregations:

  1. Set the group­ing col­umn to Mer­chant
  2. Add the fol­low­ing aggregations: 
    • “Count Rows” for the num­ber of sales (named “Count”)
    • “Sum” of the Income col­umn (named “Sum”)
    • “Aver­age” of the Income col­umn (named “Aver­age”)
    • “All Rows” to cap­ture all relat­ed data (named “List”)

After click­ing OK, Pow­er Query cre­ates a new table with our mer­chant groups and the spec­i­fied aggre­ga­tions. The Count, Sum, and Aver­age val­ues match the Excel for­mu­las we pre­pared ear­li­er, con­firm­ing that our group­ing is work­ing correctly.

Customizing the Product List

The “All Rows” aggre­ga­tion cre­ates a nest­ed table for each mer­chant con­tain­ing all their sales records. How­ev­er, we only want a list of unique prod­ucts sold by each mer­chant, not all the row data. Since the Group By dia­log does­n’t offer this spe­cif­ic option, we need to mod­i­fy the M for­mu­la directly.

Modifying the M Formula

To cus­tomize our prod­uct list, we need to edit the M code that Pow­er Query generated:

  1. The default “All Rows” aggre­ga­tion pro­duces tables with all columns for each merchant
  2. We need to change this to extract only the Prod­uct col­umn values
  3. Fur­ther­more, we need to elim­i­nate dupli­cates from these lists

The orig­i­nal for­mu­la includes {_} which ref­er­ences the entire row. We need to replace this with a ref­er­ence to just the Prod­uct col­umn: [Prod­uct]. Addi­tion­al­ly, we need to wrap this in the List.Distinct func­tion to remove duplicates:

The mod­i­fied for­mu­la sec­tion should look like: List.Distinct([Product])

Transforming the Product List into Text

Now that we have lists of unique prod­ucts for each mer­chant, we can trans­form these lists into com­ma-sep­a­rat­ed text val­ues for bet­ter readability:

  1. Click on the expand but­ton in the List col­umn header
  2. Select “Extract Val­ues…” rather than “Expand to New Rows”
  3. Choose a cus­tom delim­iter (com­ma fol­lowed by space: “, ”)
  4. Click OK

This trans­forms our prod­uct lists into read­able text strings show­ing all the unique prod­ucts each mer­chant has sold, sep­a­rat­ed by commas.

Loading the Results Back to Excel

Once we’ve cre­at­ed our sum­ma­ry with all the required met­rics, we can load the results back to Excel:

  1. Go to the Home tab and click “Close & Load To…”
  2. Select “Exist­ing Work­sheet” and choose where to place the results
  3. Click OK

After load­ing, we can for­mat the Sum and Aver­age columns as cur­ren­cy to improve read­abil­i­ty. The final result is a clean sum­ma­ry table show­ing the count of sales, total income, aver­age income, and unique prod­ucts for each merchant.

Benefits of Using Power Query for Grouping

While our exam­ple used a rel­a­tive­ly small dataset, the advan­tages of using Pow­er Query for this type of analy­sis become even more appar­ent with larg­er data volumes:

  • Effi­cien­cy — Per­forms cal­cu­la­tions much faster than Excel for­mu­las for large datasets
  • Repeata­bil­i­ty — The query can be refreshed when data changes with­out redo­ing formulas
  • Flex­i­bil­i­ty — Allows for com­plex aggre­ga­tions and trans­for­ma­tions that would be dif­fi­cult with Excel formulas
  • Data clean­ing — Inte­grates with oth­er Pow­er Query trans­for­ma­tions for com­pre­hen­sive data preparation

By mas­ter­ing this group­ing tech­nique in Pow­er Query, you can cre­ate sophis­ti­cat­ed sales sum­maries and oth­er data aggre­ga­tions with just a few clicks, sav­ing sig­nif­i­cant time and effort com­pared to tra­di­tion­al Excel methods.

Automate Student Exam Grading with Excel Power Query

Automate Student Exam Grading with Excel Power Query

Using append­ing in Pow­er Query offers a stream­lined approach to auto­mat­i­cal­ly grade stu­dent exams based on pre­de­fined thresh­olds. This method elim­i­nates man­u­al grad­ing errors and saves sig­nif­i­cant time when pro­cess­ing large sets of stu­dent scores, pro­vid­ing an effi­cient way to man­age aca­d­e­m­ic assessments.

The tech­nique com­bines stu­dent scores with grade thresh­olds to gen­er­ate accu­rate results that can be eas­i­ly updat­ed when grad­ing cri­te­ria change.

Watch my step-by-step tuto­r­i­al on how to imple­ment this grad­ing sys­tem using Pow­er Query:

Understanding the Grading Process with Power Query

Pow­er Query’s append­ing fea­ture allows us to com­bine mul­ti­ple tables in a way that cre­ates rela­tion­ships between stu­dent scores and grade thresh­olds. The append­ing process essen­tial­ly stacks tables on top of each oth­er, cre­at­ing a sin­gle dataset that we can manip­u­late to auto­mat­i­cal­ly assign grades. This approach is par­tic­u­lar­ly use­ful in edu­ca­tion­al set­tings where instruc­tors need to process mul­ti­ple exam results efficiently.

The pri­ma­ry advan­tage of using this method is that when grad­ing cri­te­ria change, you don’t need to man­u­al­ly reas­sign grades to each stu­dent. Sim­ply update the thresh­old val­ues in your source table, refresh the query, and all grades will auto­mat­i­cal­ly update to reflect the new standards.

Required Resources

Before start­ing the grad­ing process, you’ll need two essen­tial tables:

  • A table con­tain­ing stu­dent names and their exam points
  • A table defin­ing grade thresh­olds (what point ranges cor­re­spond to which let­ter grades)

These tables will serve as the foun­da­tion for our auto­mat­ed grad­ing sys­tem. The struc­ture of these tables does­n’t need to be complex—just clear­ly orga­nized with appro­pri­ate head­ers to iden­ti­fy the data.

Setting Up the Tables in Power Query

To begin imple­ment­ing the grad­ing sys­tem, we need to import both tables into Pow­er Query. This process allows us to manip­u­late and com­bine the data efficiently.

First, with the stu­dent scores table select­ed, nav­i­gate to the Data tab in Excel. From there, select “From Table/Range” to import the first table into Pow­er Query. When prompt­ed to load the data, choose “Cre­ate Con­nec­tion Only” as we’ll be manip­u­lat­ing this data fur­ther before bring­ing it back to Excel.

Repeat the same process for your grade thresh­olds table. Hav­ing both tables acces­si­ble in Pow­er Query is essen­tial for the next step where we’ll com­bine them through appending.

Appending the Tables

Once both tables are in Pow­er Query, we can com­bine them using the append function:

On the Home tab with­in Pow­er Query Edi­tor, click the Append Queries but­ton (not “Append Queries as New”). In the dia­log box that appears, con­firm that you want to append just the two tables you’ve import­ed. Select the stu­dent table as your pri­ma­ry table and the grades thresh­old table as the table to append.

After append­ing, you’ll notice that the result­ing table con­tains all rows from both orig­i­nal tables. The stu­dent data appears first, fol­lowed by the thresh­old data. This com­bined dataset gives us every­thing we need to assign grades automatically.

Organizing and Sorting the Data

For our grad­ing sys­tem to work prop­er­ly, we need to sort the append­ed data by the num­ber of points. This sort­ing is a cru­cial step because it allows us to arrange scores in ascend­ing order, mak­ing it eas­i­er to assign appro­pri­ate grades.

Click on the col­umn con­tain­ing the points/scores and sort from A to Z (ascend­ing). This action arranges all scores from low­est to high­est, with the grade thresh­olds inte­grat­ed among the stu­dent scores based on their point values.

Because of how the data is now arranged, the grade thresh­olds appear at posi­tions where they nat­u­ral­ly divide the stu­dent scores into dif­fer­ent grade categories.

Assigning Grades Automatically

With our data prop­er­ly sort­ed, we can now use a sim­ple but pow­er­ful tech­nique to assign grades to each stu­dent. The key to this process is the Fill Down com­mand in Pow­er Query.

The grade col­umn ini­tial­ly has val­ues only in the rows that came from our thresh­old table. To assign these grades to stu­dents, select the grade col­umn and nav­i­gate to the Trans­form tab. Click on “Fill” and then “Down.” This action copies each grade down to all rows below it until anoth­er grade is encountered.

As a result, each stu­dent row is auto­mat­i­cal­ly assigned the grade cor­re­spond­ing to their score based on the defined thresh­olds. For exam­ple, if the thresh­old for grade C is 40 points, stu­dents with 40 or more points (but below the next thresh­old) will be assigned a C.

Filtering and Finalizing the Results

After fill­ing down the grades, we need to fil­ter out the thresh­old rows to see only the stu­dent results:

Click the fil­ter but­ton on the col­umn con­tain­ing stu­dent names and select “Fil­ter out emp­ty val­ues” or specif­i­cal­ly select only the rows con­tain­ing actu­al stu­dent data. This leaves us with just the stu­dent infor­ma­tion and their cor­re­spond­ing grades.

With the data fil­tered and grades assigned, we can now load this infor­ma­tion back to Excel. Nav­i­gate to the Home tab and select “Close & Load To.” Choose where you want to place the results in your workbook.

Updating Grades When Thresholds Change

One of the most pow­er­ful aspects of this approach is how eas­i­ly you can adjust grad­ing criteria:

If you need to change the grad­ing thresh­olds, sim­ply update the val­ues in your orig­i­nal thresh­old table. For exam­ple, you might change the min­i­mum score for an A from 90 to 88 points. After mak­ing these changes, save your file and then refresh the query (right-click on your results table and select “Refresh”).

The entire grad­ing process will auto­mat­i­cal­ly run again with the new thresh­olds, updat­ing all stu­dent grades accord­ing­ly. This makes the sys­tem extreme­ly flex­i­ble and saves sig­nif­i­cant time when adjust­ments are needed.

Real-World Application Example

In my video demon­stra­tion, I showed how this sys­tem works with a sam­ple dataset. For instance, Jen­nifer earned 13 points which result­ed in an F grade, Robert received 34 points also result­ing in an F, while Olivia scored 40 points which met the thresh­old for an E grade.

After chang­ing the thresh­olds and refresh­ing the query, the grades updat­ed auto­mat­i­cal­ly to reflect the new stan­dards. This dynam­ic updat­ing capa­bil­i­ty is what makes this approach so valu­able for edu­ca­tors who may need to adjust grad­ing scales based on class per­for­mance or oth­er factors.

How to Clean one column based on the other column

How to Clean one column based on the other column

Pow­er Query offers pow­er­ful solu­tions for clean­ing data across columns by lever­ag­ing infor­ma­tion con­tained in relat­ed columns. In this arti­cle, I’ll demon­strate how to effi­cient­ly remove cryp­tocur­ren­cy sym­bols from a name col­umn when those sym­bols already exist in a sep­a­rate col­umn, even when the sym­bols have vary­ing lengths.

This tech­nique is par­tic­u­lar­ly valu­able when deal­ing with incon­sis­tent data for­mats that require stan­dard­iza­tion for analysis.

Watch my step-by-step tuto­r­i­al video to see these tech­niques in action:

Understanding the Data Cleaning Challenge

When work­ing with finan­cial data, espe­cial­ly cryp­tocur­ren­cy infor­ma­tion, you’ll often encounter datasets where the same infor­ma­tion appears redun­dant­ly across mul­ti­ple columns. In my exam­ple, I have a table con­tain­ing cryp­tocur­ren­cy data where the name col­umn includes both the cryp­tocur­ren­cy name and its sym­bol (like “Bit­coin BTC”), while a sep­a­rate sym­bol col­umn already con­tains just the sym­bol (“BTC”).

The chal­lenge lies in clean­ing the name col­umn by remov­ing these redun­dant sym­bols. What makes this par­tic­u­lar­ly tricky is that cryp­tocur­ren­cy sym­bols vary in length — some might be three char­ac­ters (BTC), oth­ers four (DOGE) or more. This incon­sis­ten­cy means we can’t sim­ply remove a fixed num­ber of char­ac­ters from the end of each name.

Preparing the Data in Power Query

To begin the clean­ing process, we need to load our data into Pow­er Query where we can apply the nec­es­sary transformations:

  1. Nav­i­gate to the Data tab in Excel
  2. Select your data range and choose to load it to Pow­er Query
  3. Remove any unnec­es­sary columns to focus on the task at hand

In my case, I ini­tial­ly removed sev­er­al columns to focus on just the key columns for this demon­stra­tion — the name col­umn con­tain­ing both cryp­tocur­ren­cy names and sym­bols, and the sym­bol col­umn con­tain­ing just the sym­bols. This sim­pli­fied approach allows us to con­cen­trate specif­i­cal­ly on the clean­ing tech­nique with­out distraction.

Solution #1: Using Text After Delimiter

The first solu­tion uses Pow­er Query’s built-in func­tions to extract the rel­e­vant part of the text. This approach is more straight­for­ward and uses the UI-based options:

To imple­ment this solution:

  1. Select the name col­umn in Pow­er Query
  2. Go to the Add Col­umn tab
  3. Choose Extract and then Text After Delimiter

Ini­tial­ly, when using the UI, you might try enter­ing a spe­cif­ic sym­bol like “BTC” as the delim­iter. How­ev­er, this would only work for one cryp­tocur­ren­cy. The real pow­er comes from mod­i­fy­ing the for­mu­la to use the sym­bol col­umn as a reference.

After cre­at­ing the ini­tial for­mu­la, you’ll need to edit it in the for­mu­la bar to make it dynam­ic. Replace the hard­cod­ed delim­iter text with a ref­er­ence to the sym­bol col­umn. This way, Pow­er Query will use each row’s cor­re­spond­ing sym­bol to extract just the cryp­tocur­ren­cy name, regard­less of the sym­bol­’s length.

Solution #2: Using Text.RemoveRange Function

The sec­ond solu­tion is more com­plex but demon­strates the flex­i­bil­i­ty of Pow­er Query’s M lan­guage. For this approach:

  1. Go to the Add Col­umn tab
  2. Select Cus­tom Column
  3. Use the Text.RemoveRange func­tion to pre­cise­ly remove the sym­bol part

The for­mu­la will look some­thing like:

Text.RemoveRange([name], 0, Text.Length([symbol]))

This func­tion works by remov­ing a spe­cif­ic range of char­ac­ters from the text. It requires three parameters:

  • The source text (our name column)
  • The start­ing posi­tion (0‑based in Pow­er Query)
  • The num­ber of char­ac­ters to remove (which we deter­mine using the length of the symbol)

Remem­ber that Pow­er Query uses zero-based index­ing, mean­ing the first char­ac­ter posi­tion is 0, not 1. This is cru­cial when work­ing with text manip­u­la­tion functions.

Finalizing the Transformation

After apply­ing either solu­tion, we need to clean up our query:

  1. Rename the new­ly cre­at­ed col­umn to some­thing mean­ing­ful (like “name”)
  2. Delete the orig­i­nal name col­umn that con­tained the symbols
  3. If need­ed, restore any pre­vi­ous­ly removed columns by delet­ing the “Removed Columns” step
  4. Arrange columns in the desired order by drag­ging them

In my exam­ple, I also demon­strat­ed how to clean anoth­er col­umn (cir­cu­lat­ing sup­ply) that had a sim­i­lar issue but with a sim­pler pat­tern. Since all the unwant­ed sym­bols in this col­umn were pre­ced­ed by a space, I could use the Extract Text Before Delim­iter func­tion with a space as the delimiter.

Loading the Cleaned Data Back to Excel

Once all trans­for­ma­tions are com­plete, we can load the cleaned data back to Excel:

  1. Go to the Home tab in Pow­er Query
  2. Select Close & Load To
  3. Choose whether to load to a new work­sheet or an exist­ing location

In my demon­stra­tion, I loaded the data to cell A13 on the exist­ing work­sheet, which allows for easy com­par­i­son between the orig­i­nal and cleaned data.

Why This Approach Matters

The beau­ty of using Pow­er Query for this type of data clean­ing is that it cre­ates a repeat­able process. If your data source updates, you can sim­ply refresh the query, and all the clean­ing steps will be reap­plied auto­mat­i­cal­ly. This saves sig­nif­i­cant time com­pared to man­u­al clean­ing or using for­mu­las that would need to be copied down for each new set of data.

Addi­tion­al­ly, by lever­ag­ing infor­ma­tion from one col­umn to clean anoth­er, we cre­ate a more robust solu­tion that can han­dle vari­a­tions in the data for­mat with­out break­ing. This approach is far supe­ri­or to hard-cod­ing spe­cif­ic text val­ues or char­ac­ter counts that might change across dif­fer­ent datasets.