Remove Top or Bottom Rows Automatically

Remove Top or Bottom Rows Automatically

Dynam­i­cal­ly remov­ing top and bot­tom rows in Pow­er Query can sig­nif­i­cant­ly stream­line your data clean­ing process when deal­ing with incon­sis­tent data imports. This tech­nique allows you to auto­mat­i­cal­ly elim­i­nate unnec­es­sary head­er rows or foot­er infor­ma­tion based on spe­cif­ic con­di­tions rather than fixed row counts, mak­ing your data trans­for­ma­tion process more robust and adapt­able to chang­ing source files.

The abil­i­ty to use con­di­tions rather than sta­t­ic num­bers is par­tic­u­lar­ly valu­able when work­ing with reg­u­lar­ly updat­ed reports that may con­tain vary­ing amounts of meta­da­ta or sum­ma­ry information.

In my video, I demon­strate how to imple­ment this dynam­ic row removal tech­nique that I learned at an Excel Lon­don Meetup:

Understanding the Problem with Static Row Removal

When import­ing data from text files, CSV files, or oth­er sources, you often encounter extra­ne­ous infor­ma­tion at the top and bot­tom of your dataset. These might include title rows, explana­to­ry notes, sum­ma­ry sta­tis­tics, or foot­er infor­ma­tion that aren’t part of the actu­al data you need to ana­lyze. Using Pow­er Query’s stan­dard “Remove Top Rows” or “Remove Bot­tom Rows” func­tions with a fixed num­ber works fine when your source data struc­ture nev­er changes, but becomes prob­lem­at­ic when the num­ber of these rows varies.

In the exam­ple shown in my video, we have mul­ti­ple rows of meta­da­ta before the actu­al head­er row (which con­tains “Date” as the first col­umn val­ue), and sev­er­al rows of addi­tion­al infor­ma­tion at the bot­tom of the data that need to be removed. There are also some miss­ing val­ues through­out the data that make sim­ple fil­ter­ing ineffective.

Dynamically Removing Top Rows Based on Conditions

The con­ven­tion­al approach to remov­ing top rows in Pow­er Query involves spec­i­fy­ing a fixed num­ber. How­ev­er, this can be prob­lem­at­ic when the num­ber of head­er rows changes. The dynam­ic solu­tion involves using a con­di­tion rather than a fixed count.

Here’s how to imple­ment this technique:

  1. Go to the Home tab in Pow­er Query Editor
  2. Select “Remove Rows” and then “Remove Top Rows”
  3. Instead of enter­ing a num­ber, mod­i­fy the for­mu­la to use the “each” key­word fol­lowed by a condition

The for­mu­la will look some­thing like this: each [Column1] <> “Date” (assum­ing “Date” is the head­er text in your first col­umn). This tells Pow­er Query to keep remov­ing rows until it finds a row where the first col­umn con­tains the text “Date”.

The Technical Details of the Table.Skip Function

Behind the scenes, Pow­er Query uses the Table.Skip func­tion when you remove top rows. This func­tion has a hid­den capa­bil­i­ty that isn’t obvi­ous from the user inter­face — it can accept either a count or a con­di­tion parameter.

When using a con­di­tion, the syn­tax changes from sim­ply pro­vid­ing a num­ber to using the format:

each [condition]

The con­di­tion is eval­u­at­ed for each row, start­ing from the top, and rows are removed until the con­di­tion is no longer true. This allows for dynam­ic adap­ta­tion to vary­ing source data structures.

Dynamically Removing Bottom Rows

Sim­i­lar­ly, we can apply the same con­cept to remove rows from the bot­tom of our dataset. This is par­tic­u­lar­ly use­ful when deal­ing with files that con­tain sum­ma­ry infor­ma­tion, notes, or oth­er foot­er data that should be exclud­ed from analysis.

The process for remov­ing bot­tom rows dynam­i­cal­ly is:

  1. Go to the Home tab in Pow­er Query Editor
  2. Select “Remove Rows” and then “Remove Bot­tom Rows”
  3. Replace the sta­t­ic num­ber with a con­di­tion using the “each” keyword

For exam­ple, you might use a for­mu­la like: each [Mer­chant] = “” to remove rows from the bot­tom where the Mer­chant col­umn con­tains an emp­ty text string. Or you might use each [Rev­enue] = null to remove rows where the Rev­enue col­umn con­tains null values.

Handling Different Types of Empty Values

When work­ing with bot­tom rows, it’s impor­tant to under­stand the dif­fer­ent types of emp­ty val­ues that might appear in your data:

  • Emp­ty text strings — rep­re­sent­ed by “” in formulas
  • Null val­ues — rep­re­sent­ed by null in formulas
  • Miss­ing val­ues — which might be null or emp­ty depend­ing on the data source

In the video demon­stra­tion, I show how to han­dle both emp­ty text strings and null val­ues as con­di­tions for remov­ing bot­tom rows. The key is to iden­ti­fy which col­umn and which type of emp­ty val­ue reli­ably indi­cates the foot­er sec­tion of your data.

Practical Applications and Benefits

This dynam­ic row removal tech­nique is par­tic­u­lar­ly valu­able in sev­er­al scenarios:

When work­ing with reg­u­lar­ly updat­ed reports where the struc­ture might change slight­ly between ver­sions, this approach ensures your Pow­er Query solu­tion remains robust. It’s also help­ful when deal­ing with data exports from sys­tems that include vary­ing amounts of meta­da­ta or when con­sol­i­dat­ing mul­ti­ple files that might have dif­fer­ent head­er structures.

The major ben­e­fits include:

  • More resilient data trans­for­ma­tion process­es that don’t break when source for­mats change slightly
  • Reduced need for man­u­al inter­ven­tion when pro­cess­ing new data
  • Abil­i­ty to han­dle files with incon­sis­tent struc­ture automatically
  • Greater flex­i­bil­i­ty com­pared to sta­t­ic row removal or sim­ple filtering

This tech­nique demon­strates the pow­er of M lan­guage in Pow­er Query, allow­ing for solu­tions that go beyond what’s imme­di­ate­ly avail­able in the user inter­face. By under­stand­ing and lever­ag­ing these more advanced capa­bil­i­ties, you can cre­ate more robust data trans­for­ma­tion processes.

Important Considerations

When imple­ment­ing this tech­nique, keep in mind a few impor­tant points:

The con­di­tion you use must reli­ably iden­ti­fy the bound­ary between the rows you want to keep and those you want to remove. Choose col­umn val­ues that are con­sis­tent­ly present (or con­sis­tent­ly absent) at these bound­aries. Also be aware that if your con­di­tion nev­er eval­u­ates to false, you could poten­tial­ly remove all rows from your dataset, so test­ing with rep­re­sen­ta­tive sam­ple data is essential.

Addi­tion­al­ly, remem­ber that this tech­nique works even when you have miss­ing val­ues in your actu­al data. As shown in the video, the rows are only removed when they match the spe­cif­ic con­di­tion you’ve defined, allow­ing rows with some miss­ing val­ues to be retained as long as they don’t match your removal condition.

How to Split Text by Position in Power Query

How to Split Text by Position in Power Query

Split­ting text by length in Pow­er Query can trans­form clut­tered data into orga­nized, usable infor­ma­tion with­out com­plex for­mu­las. This tech­nique allows you to break down text strings into sep­a­rate columns based on spe­cif­ic char­ac­ter posi­tions, mak­ing it espe­cial­ly use­ful when deal­ing with fixed-width data for­mats that con­tain mul­ti­ple pieces of information.

The real pow­er of this approach lies in its abil­i­ty to han­dle irreg­u­lar split­ting require­ments where each sec­tion has a dif­fer­ent length.

In my video, I demon­strate the com­plete process of split­ting text by length in Pow­er Query:

Understanding the Data Structure

When work­ing with text data that needs to be split, it’s essen­tial to first ana­lyze the struc­ture of your text. In the demon­stra­tion, I work with a dataset where each text string con­tains sev­er­al pieces of infor­ma­tion with vary­ing lengths:

  • Per­son infor­ma­tion (30 characters)
  • Delim­iter char­ac­ters (semi­colons, pipes) that need to be removed
  • Date infor­ma­tion (appear­ing as numbers)
  • Cur­ren­cy val­ues in dif­fer­ent formats

The chal­lenge lies in the fact that each sec­tion has a dif­fer­ent length, mak­ing stan­dard split func­tions less effec­tive. This is pre­cise­ly where Pow­er Query’s split by posi­tion fea­ture becomes invaluable.

Importing Data to Power Query

The first step in the process is to import your data into Pow­er Query. This can be done eas­i­ly by select­ing your data table and using the From Table/Range option in the Data tab. Once your data is in Pow­er Query, you’ll have access to pow­er­ful trans­for­ma­tion tools that aren’t avail­able in stan­dard Excel.

Pow­er Query pro­vides a user-friend­ly inter­face where you can see your data and apply var­i­ous trans­for­ma­tions step by step. This visu­al approach makes it eas­i­er to track changes and ensure that your data is being processed correctly.

Splitting Text by Position

With the data import­ed into Pow­er Query, we can now split the text col­umn based on spe­cif­ic posi­tions. Here’s how to do it:

  1. Go to the Home tab in Pow­er Query
  2. Select Split Col­umn and choose “By Posi­tions” (not “By Num­ber of Characters”)
  3. Enter the spe­cif­ic posi­tions where you want to split the text

In the exam­ple, I need­ed to split at posi­tions 0, 30, 31, 39, and 41. It’s impor­tant to note that Pow­er Query counts from zero for the first posi­tion, not one. These num­bers rep­re­sent the start­ing points for each sec­tion of text.

After press­ing OK, Pow­er Query cre­ates new columns based on these posi­tion splits. The result is five sep­a­rate columns, each con­tain­ing a dis­tinct part of the orig­i­nal text string.

Refining the Split Data

Renaming and Removing Columns

After split­ting the text, we need to orga­nize our data by giv­ing mean­ing­ful names to the impor­tant columns and remov­ing unnec­es­sary ones. In the for­mu­la bar, I renamed the columns to reflect their content:

  • “Text.1” became “Per­son”
  • “Text.3” became “Date”
  • “Text.5” became “Pay­out”

For columns con­tain­ing delim­iters or oth­er unwant­ed infor­ma­tion (in this case, “Text.2” and “Text.4”), we can sim­ply delete them by select­ing them with Ctrl+click and press­ing the Delete key.

Correcting Data Types

Once we have our prop­er­ly named columns, we need to ensure that each col­umn has the cor­rect data type. In the demonstration:

  • The “Date” col­umn was ini­tial­ly rec­og­nized as an inte­ger and need­ed to be con­vert­ed to the date data type
  • The “Pay­out” col­umn con­tained cur­ren­cy val­ues in dif­fer­ent for­mats that need­ed prop­er interpretation

Pow­er Query can auto­mat­i­cal­ly detect and con­vert data types, but some­times man­u­al inter­ven­tion is nec­es­sary. By click­ing on the data type icon in the col­umn head­er, you can force Pow­er Query to inter­pret the data as a spe­cif­ic type.

Working with Regional Settings

An impor­tant aspect of work­ing with dates and cur­ren­cy val­ues is under­stand­ing how region­al set­tings affect data inter­pre­ta­tion. In Pow­er Query, you can adjust these set­tings to match the for­mat of your data.

To access these set­tings, go to:

  1. Options and Settings
  2. Query Options
  3. Cur­rent Workbook
  4. Region­al Settings

In my demon­stra­tion, the region­al set­tings were set to “Eng­lish (Unit­ed King­dom)” which allowed Pow­er Query to cor­rect­ly inter­pret the pound (£) cur­ren­cy sym­bols regard­less of their posi­tion in the text. If your data uses dif­fer­ent region­al for­mats, you can adjust these set­tings to match your needs.

For indi­vid­ual columns, you can also click on the data type icon and select “Using Locale” to spec­i­fy both the data type and region­al for­mat for that par­tic­u­lar col­umn. This gives you fine-grained con­trol over how Pow­er Query inter­prets your data.

Finalizing the Transformation

After com­plet­ing all the nec­es­sary trans­for­ma­tions — split­ting the text, renam­ing columns, remov­ing unnec­es­sary columns, and set­ting the cor­rect data types — the final step is to load the trans­formed data back into Excel.

From the Home tab, select “Close & Load” to export your prop­er­ly orga­nized table back to Excel. The result is a clean, struc­tured dataset with sep­a­rate columns for per­son infor­ma­tion, dates, and pay­ment val­ues, all with the appro­pri­ate data types.

This tech­nique of split­ting text by posi­tion in Pow­er Query is par­tic­u­lar­ly use­ful when deal­ing with fixed-width data exports from lega­cy sys­tems, stan­dard­ized report out­puts, or any sit­u­a­tion where text strings con­tain mul­ti­ple data points at known posi­tions. By mas­ter­ing this approach, you can quick­ly trans­form dense, com­bined text fields into orga­nized and usable data.

How to Import Fixed-Width Files to Excel

How to Import Fixed-Width Files to Excel

Import­ing fixed-width col­umn data to Excel can be chal­leng­ing when each col­umn has a set num­ber of char­ac­ters, often result­ing in unnec­es­sary spaces. In this arti­cle, I’ll show you how to prop­er­ly import fixed-width CSV files using Pow­er Query, ensur­ing clean data with­out those prob­lem­at­ic extra spaces.

This tech­nique is essen­tial for han­dling struc­tured data exports from lega­cy sys­tems or stan­dard­ized reports.

Understanding Fixed-Width Column Data

Fixed-width col­umn data is a spe­cif­ic for­mat where each col­umn occu­pies exact­ly the same num­ber of char­ac­ters in every row. Unlike delim­it­ed for­mats (like com­ma-sep­a­rat­ed val­ues), fixed-width files allo­cate a pre­de­ter­mined num­ber of spaces for each field. This means short­er val­ues get padded with spaces to main­tain the fixed width. For exam­ple, the date col­umn might always be 10 char­ac­ters wide, the mer­chant col­umn 20 char­ac­ters, and so on.

This for­mat is com­mon in old­er sys­tems, gov­ern­ment data, and stan­dard­ized reports, but it presents unique chal­lenges when import­ing to Excel. The main issue is that these extra spaces can cause prob­lems with data analy­sis and visu­al­iza­tion if not prop­er­ly han­dled dur­ing the import process.

Importing Fixed-Width Data Using Power Query

Pow­er Query pro­vides excel­lent tools for han­dling fixed-width col­umn data. Here’s the process to prop­er­ly import this type of data:

Initial Import Steps

To begin import­ing fixed-width data, fol­low these steps:

  1. Go to the Data tab in Excel
  2. Click on “From Text/CSV” option
  3. Locate and select your fixed-width file

When the Pow­er Query Edi­tor opens, you might need to adjust the win­dow size to see all avail­able options, as some­times the third col­umn of set­tings may be hid­den if the win­dow is too nar­row. Make sure to expand the win­dow to see all import settings.

Configuring Import Settings

Dur­ing the import process, you’ll need to con­fig­ure sev­er­al impor­tant settings:

First, check the file ori­gin (encod­ing) — UTF‑8 works well for most mod­ern files. Then, ver­i­fy that the delim­iter is set cor­rect­ly. For fixed-width files, Pow­er Query typ­i­cal­ly auto­mat­i­cal­ly detects this for­mat. You should see “Fixed Width” select­ed rather than com­ma, tab, or oth­er delimiters.

The data type detec­tion set­ting is anoth­er cru­cial choice. You have three options:

  • Based on first 200 rows
  • Based on entire dataset
  • Do not detect data types

In most cas­es with fixed-width data, it’s bet­ter to select “Do not detect data types” ini­tial­ly and han­dle data type con­ver­sion after clean­ing the spaces. This pre­vents Pow­er Query from mak­ing assump­tions about your data types that might be affect­ed by the extra spaces.

Handling Extra Spaces in Power Query

The key chal­lenge with fixed-width data is man­ag­ing the extra spaces that pad each col­umn. When exam­in­ing the data in Pow­er Query, you’ll notice these extra spaces in both the col­umn head­ers and text fields. These spaces are nec­es­sary in the orig­i­nal file to main­tain the fixed-width for­mat, but they’re prob­lem­at­ic for analy­sis in Excel.

For exam­ple, when click­ing on a col­umn head­er, you might see some­thing like “Date ” with sev­er­al spaces after the actu­al name. Sim­i­lar­ly, text fields like prod­uct names or mer­chant names will have trail­ing spaces to fill their allo­cat­ed width.

Removing Extra Spaces

To clean up the data properly:

  1. Select all columns in the Pow­er Query Edi­tor by click­ing the first col­umn head­er and then Shift-click­ing the last column
  2. Go to the Trans­form tab
  3. Click on “For­mat” and select “Trim”

This impor­tant step removes all lead­ing and trail­ing white spaces from every cell in your dataset while pre­serv­ing spaces that are actu­al­ly part of your data (like spaces between words in prod­uct names). After trim­ming the data, you’ll notice a sig­nif­i­cant improve­ment in how your data looks.

Setting Headers and Data Types

After remov­ing the extra spaces, you can pro­ceed with con­fig­ur­ing head­ers and data types:

Use the “Use First Row as Head­ers” option (found in the Trans­form tab or by right-click­ing on the first row) to set your head­ers prop­er­ly. Now that the spaces have been removed, your col­umn names will appear clean and professional.

With clean data, you can now set appro­pri­ate data types for each col­umn. For exam­ple, date columns should be for­mat­ted as dates, and rev­enue or cost columns as cur­ren­cy or dec­i­mal num­bers. Pow­er Query will help iden­ti­fy appro­pri­ate types, but you can man­u­al­ly change them by click­ing the data type icon in each col­umn header.

Loading Data to Excel

The final step is to load your cleaned data into Excel:

  1. Click “Close & Load” (or “Close & Load To” for more options)
  2. Select whether to load as a table or to a spe­cif­ic location
  3. Choose an exist­ing work­sheet or cre­ate a new one

When load­ing the data, you have the option to add it to the data mod­el if you plan to use it with Pow­er Piv­ot, but this isn’t nec­es­sary for basic Excel analy­sis. One impor­tant thing to note is that while Pow­er Query may detect cur­ren­cy data types, this does­n’t auto­mat­i­cal­ly apply Excel’s cur­ren­cy for­mat­ting to your table. You’ll need to apply appro­pri­ate num­ber for­mat­ting in Excel after importing.

For­tu­nate­ly, when you refresh your data in the future, Excel will main­tain any num­ber for­mat­ting you’ve applied to the table, mak­ing future updates seamless.

Benefits of Using Power Query for Fixed-Width Data

Using Pow­er Query for import­ing fixed-width data offers sev­er­al advan­tages over oth­er methods:

The abil­i­ty to trans­form data before load­ing it to Excel saves sig­nif­i­cant time com­pared to man­u­al cleanup after import. The process is also repeat­able — if you receive updat­ed ver­sions of the same file, you can sim­ply refresh your query rather than repeat­ing the entire import and cleanup process.

Addi­tion­al­ly, Pow­er Query main­tains a record of all trans­for­ma­tions applied, mak­ing your data prepa­ra­tion process trans­par­ent and eas­i­ly adjustable if need­ed. This approach ensures con­sis­ten­cy in how your fixed-width data is processed each time.