Importing fixed-width column data to Excel can be challenging when each column has a set number of characters, often resulting in unnecessary spaces. In this article, I’ll show you how to properly import fixed-width CSV files using Power Query, ensuring clean data without those problematic extra spaces.
This technique is essential for handling structured data exports from legacy systems or standardized reports.
Understanding Fixed-Width Column Data
Fixed-width column data is a specific format where each column occupies exactly the same number of characters in every row. Unlike delimited formats (like comma-separated values), fixed-width files allocate a predetermined number of spaces for each field. This means shorter values get padded with spaces to maintain the fixed width. For example, the date column might always be 10 characters wide, the merchant column 20 characters, and so on.
This format is common in older systems, government data, and standardized reports, but it presents unique challenges when importing to Excel. The main issue is that these extra spaces can cause problems with data analysis and visualization if not properly handled during the import process.
Importing Fixed-Width Data Using Power Query
Power Query provides excellent tools for handling fixed-width column data. Here’s the process to properly import this type of data:
Initial Import Steps
To begin importing fixed-width data, follow these steps:
Go to the Data tab in Excel
Click on “From Text/CSV” option
Locate and select your fixed-width file
When the Power Query Editor opens, you might need to adjust the window size to see all available options, as sometimes the third column of settings may be hidden if the window is too narrow. Make sure to expand the window to see all import settings.
Configuring Import Settings
During the import process, you’ll need to configure several important settings:
First, check the file origin (encoding) — UTF‑8 works well for most modern files. Then, verify that the delimiter is set correctly. For fixed-width files, Power Query typically automatically detects this format. You should see “Fixed Width” selected rather than comma, tab, or other delimiters.
The data type detection setting is another crucial choice. You have three options:
Based on first 200 rows
Based on entire dataset
Do not detect data types
In most cases with fixed-width data, it’s better to select “Do not detect data types” initially and handle data type conversion after cleaning the spaces. This prevents Power Query from making assumptions about your data types that might be affected by the extra spaces.
Handling Extra Spaces in Power Query
The key challenge with fixed-width data is managing the extra spaces that pad each column. When examining the data in Power Query, you’ll notice these extra spaces in both the column headers and text fields. These spaces are necessary in the original file to maintain the fixed-width format, but they’re problematic for analysis in Excel.
For example, when clicking on a column header, you might see something like “Date ” with several spaces after the actual name. Similarly, text fields like product names or merchant names will have trailing spaces to fill their allocated width.
Removing Extra Spaces
To clean up the data properly:
Select all columns in the Power Query Editor by clicking the first column header and then Shift-clicking the last column
Go to the Transform tab
Click on “Format” and select “Trim”
This important step removes all leading and trailing white spaces from every cell in your dataset while preserving spaces that are actually part of your data (like spaces between words in product names). After trimming the data, you’ll notice a significant improvement in how your data looks.
Setting Headers and Data Types
After removing the extra spaces, you can proceed with configuring headers and data types:
Use the “Use First Row as Headers” option (found in the Transform tab or by right-clicking on the first row) to set your headers properly. Now that the spaces have been removed, your column names will appear clean and professional.
With clean data, you can now set appropriate data types for each column. For example, date columns should be formatted as dates, and revenue or cost columns as currency or decimal numbers. Power Query will help identify appropriate types, but you can manually change them by clicking the data type icon in each column header.
Loading Data to Excel
The final step is to load your cleaned data into Excel:
Click “Close & Load” (or “Close & Load To” for more options)
Select whether to load as a table or to a specific location
Choose an existing worksheet or create a new one
When loading the data, you have the option to add it to the data model if you plan to use it with Power Pivot, but this isn’t necessary for basic Excel analysis. One important thing to note is that while Power Query may detect currency data types, this doesn’t automatically apply Excel’s currency formatting to your table. You’ll need to apply appropriate number formatting in Excel after importing.
Fortunately, when you refresh your data in the future, Excel will maintain any number formatting you’ve applied to the table, making future updates seamless.
Benefits of Using Power Query for Fixed-Width Data
Using Power Query for importing fixed-width data offers several advantages over other methods:
The ability to transform data before loading it to Excel saves significant time compared to manual cleanup after import. The process is also repeatable — if you receive updated versions of the same file, you can simply refresh your query rather than repeating the entire import and cleanup process.
Additionally, Power Query maintains a record of all transformations applied, making your data preparation process transparent and easily adjustable if needed. This approach ensures consistency in how your fixed-width data is processed each time.
Merging tables in Power Query with summarized data provides a powerful way to consolidate information without overwhelming detail. By aggregating invoice details such as counts, sums, averages, and medians, you can create concise reports that highlight key insights while maintaining the original table structure.
This technique is particularly valuable when dealing with sales data where each invoice contains multiple line items that need to be condensed.
In my video tutorial, I demonstrate how to effectively merge tables with summarization in Power Query:
Understanding the Problem: Merging Tables with Summarization
When working with relational data like invoices and their detailed items, we often need to combine information from multiple tables. However, simply merging tables can result in redundant data and unnecessary complexity. The challenge is to merge data while simultaneously summarizing the details to provide meaningful insights.
In this demonstration, I work with two tables: one containing basic invoice information and another with detailed line items for each invoice. The goal is to enhance the invoice table with summarized information from the details table, such as the count of items, average quantities, sum of prices, and even statistical measures like medians.
Loading Tables into Power Query
The first step in our process is to load both tables into Power Query for manipulation. This gives us access to the powerful data transformation capabilities that allow for sophisticated merging and summarization.
To begin, I navigate to the Data tab in Excel and select “From Table/Range” to import our first table into Power Query. For this initial details table, I choose to load it only as a connection rather than importing it directly into the worksheet. This is done by selecting “Close and Load To” and then choosing the “Connection only” option.
After loading the details table, I follow the same process to load the invoice table. This time, however, we’ll perform our merge operation directly within this query rather than creating a new one.
Merging the Tables
With both tables loaded into Power Query, we can now merge them using the common invoice ID field. This is where the power of our approach becomes evident.
To perform the merge:
Select the “Merge Queries” command from the ribbon
Choose the second table (details) to merge with
Select matching columns for merging (invoice ID in both tables)
Choose “Left Outer Join” as the join kind to ensure all invoices are included
After pressing OK, Power Query creates a new column in our invoice table that contains the matching rows from the details table. You’ll notice that for each invoice, we now have a nested table showing all the corresponding detail rows. For example, some invoices might have two detail rows, others three, and some even five or more.
The Key Difference: Aggregating Instead of Expanding
The critical step that differentiates this approach from a standard merge is what we do next. Instead of expanding these nested tables (which would create duplicate invoice rows), we want to aggregate the information they contain.
When we click on the expand button (the small triangle) in the column header, we get options to expand or aggregate. By choosing to aggregate, we can perform calculations across all the detail rows for each invoice.
Creating Summarized Metrics
The power of this technique lies in the variety of aggregations we can perform on the detailed data. For our invoice example, I demonstrate several useful metrics:
Count of items: Simple count of how many line items exist for each invoice
Average quantity: Instead of summing quantities, we calculate the average number of items per line
Sum of unit price: Total value of the invoice
Median of unit price: Statistical measure showing the middle value of all unit prices on the invoice
To create these aggregations, I select the appropriate options from the dropdown menu when expanding the nested table column. By unchecking “Use original column name as prefix,” we get cleaner column names in our results.
Refining the Results
After creating our aggregated columns, we may want to refine the column names for clarity. This can be done directly in the formula bar in Power Query. For example, I rename one of the columns to “Count” to better reflect its purpose.
It’s also important to check the data types of our new columns. Power Query automatically assigns data types, but they may need adjustment. In our case, the “Count” column is correctly set as a decimal number, which works well since counts will always be whole numbers.
When working with monetary values like the sum of unit prices, Power Query may preserve the currency data type. However, it’s worth noting that this data type information doesn’t automatically translate to formatting in Excel. We’ll need to apply currency formatting in Excel after loading the data.
Loading the Final Results
Once we’re satisfied with our merged and summarized table, we can load it back to Excel. From the Home tab in Power Query, I select “Close and Load To” and choose to create a new table in the existing worksheet.
The final result is an enhanced invoice table that includes aggregated information from the details table without duplicating invoice records. This gives us a concise view of our data with valuable insights such as the number of items per invoice, average quantities, total values, and statistical measures.
This technique is particularly valuable for financial analysis, sales reporting, and any situation where you need to combine master records with summarized information from related detail records. By leveraging Power Query’s merging and aggregation capabilities, we can create more informative reports while maintaining a clean, non-redundant data structure.
Sales data analysis often requires summarizing results by specific categories to gain valuable business insights. In this article, I demonstrate how to efficiently calculate sales summaries using Power Query’s grouping functionality, providing a powerful alternative to traditional Excel formulas for analyzing merchant performance, income totals, and product distribution.
The grouping technique shown allows for multiple aggregations in a single operation, dramatically improving efficiency when working with larger datasets.
Watch my detailed tutorial video below to follow along with each step:
Understanding the Sales Summary Challenge
When analyzing sales data, we often need to create summaries that show performance metrics grouped by specific categories. In this demonstration, I’m working with a dataset containing sales transactions that include merchants, products, dates, and income values. My goal is to summarize this data to show several key metrics for each merchant:
Count of sales transactions
Sum of income
Average income per transaction
List of unique products sold
While these calculations can be done using Excel formulas (which I’ve prepared in the worksheet for comparison), using Power Query offers significant advantages, especially for larger datasets or repetitive analysis tasks.
Importing Data into Power Query
The first step in our process is to bring the data into Power Query where we can perform the grouping operations:
Select the data table in Excel
Navigate to the Data tab in the Excel ribbon
Click “From Table/Range” to import the data into Power Query
Once the data is loaded into the Power Query Editor, we can see all our sales records and begin the grouping process to create our summary.
Creating Groups in Power Query
To summarize our data by merchant, we need to use the Group By function in Power Query:
Select the Merchant column in the Power Query Editor
Go to the Home tab and click “Group By”
The default Group By dialog offers basic functionality with just one column for grouping and a single aggregation. For our needs, we require the Advanced option since we want to create multiple aggregations based on the merchant column.
Setting Up Advanced Grouping
In the advanced Group By dialog, we need to configure multiple aggregations:
Set the grouping column to Merchant
Add the following aggregations:
“Count Rows” for the number of sales (named “Count”)
“Sum” of the Income column (named “Sum”)
“Average” of the Income column (named “Average”)
“All Rows” to capture all related data (named “List”)
After clicking OK, Power Query creates a new table with our merchant groups and the specified aggregations. The Count, Sum, and Average values match the Excel formulas we prepared earlier, confirming that our grouping is working correctly.
Customizing the Product List
The “All Rows” aggregation creates a nested table for each merchant containing all their sales records. However, we only want a list of unique products sold by each merchant, not all the row data. Since the Group By dialog doesn’t offer this specific option, we need to modify the M formula directly.
Modifying the M Formula
To customize our product list, we need to edit the M code that Power Query generated:
The default “All Rows” aggregation produces tables with all columns for each merchant
We need to change this to extract only the Product column values
Furthermore, we need to eliminate duplicates from these lists
The original formula includes {_} which references the entire row. We need to replace this with a reference to just the Product column: [Product]. Additionally, we need to wrap this in the List.Distinct function to remove duplicates:
The modified formula section should look like: List.Distinct([Product])
Transforming the Product List into Text
Now that we have lists of unique products for each merchant, we can transform these lists into comma-separated text values for better readability:
Click on the expand button in the List column header
Select “Extract Values…” rather than “Expand to New Rows”
Choose a custom delimiter (comma followed by space: “, ”)
Click OK
This transforms our product lists into readable text strings showing all the unique products each merchant has sold, separated by commas.
Loading the Results Back to Excel
Once we’ve created our summary with all the required metrics, we can load the results back to Excel:
Go to the Home tab and click “Close & Load To…”
Select “Existing Worksheet” and choose where to place the results
Click OK
After loading, we can format the Sum and Average columns as currency to improve readability. The final result is a clean summary table showing the count of sales, total income, average income, and unique products for each merchant.
Benefits of Using Power Query for Grouping
While our example used a relatively small dataset, the advantages of using Power Query for this type of analysis become even more apparent with larger data volumes:
Efficiency — Performs calculations much faster than Excel formulas for large datasets
Repeatability — The query can be refreshed when data changes without redoing formulas
Flexibility — Allows for complex aggregations and transformations that would be difficult with Excel formulas
Data cleaning — Integrates with other Power Query transformations for comprehensive data preparation
By mastering this grouping technique in Power Query, you can create sophisticated sales summaries and other data aggregations with just a few clicks, saving significant time and effort compared to traditional Excel methods.
Using appending in Power Query offers a streamlined approach to automatically grade student exams based on predefined thresholds. This method eliminates manual grading errors and saves significant time when processing large sets of student scores, providing an efficient way to manage academic assessments.
The technique combines student scores with grade thresholds to generate accurate results that can be easily updated when grading criteria change.
Watch my step-by-step tutorial on how to implement this grading system using Power Query:
Understanding the Grading Process with Power Query
Power Query’s appending feature allows us to combine multiple tables in a way that creates relationships between student scores and grade thresholds. The appending process essentially stacks tables on top of each other, creating a single dataset that we can manipulate to automatically assign grades. This approach is particularly useful in educational settings where instructors need to process multiple exam results efficiently.
The primary advantage of using this method is that when grading criteria change, you don’t need to manually reassign grades to each student. Simply update the threshold values in your source table, refresh the query, and all grades will automatically update to reflect the new standards.
Required Resources
Before starting the grading process, you’ll need two essential tables:
A table containing student names and their exam points
A table defining grade thresholds (what point ranges correspond to which letter grades)
These tables will serve as the foundation for our automated grading system. The structure of these tables doesn’t need to be complex—just clearly organized with appropriate headers to identify the data.
Setting Up the Tables in Power Query
To begin implementing the grading system, we need to import both tables into Power Query. This process allows us to manipulate and combine the data efficiently.
First, with the student scores table selected, navigate to the Data tab in Excel. From there, select “From Table/Range” to import the first table into Power Query. When prompted to load the data, choose “Create Connection Only” as we’ll be manipulating this data further before bringing it back to Excel.
Repeat the same process for your grade thresholds table. Having both tables accessible in Power Query is essential for the next step where we’ll combine them through appending.
Appending the Tables
Once both tables are in Power Query, we can combine them using the append function:
On the Home tab within Power Query Editor, click the Append Queries button (not “Append Queries as New”). In the dialog box that appears, confirm that you want to append just the two tables you’ve imported. Select the student table as your primary table and the grades threshold table as the table to append.
After appending, you’ll notice that the resulting table contains all rows from both original tables. The student data appears first, followed by the threshold data. This combined dataset gives us everything we need to assign grades automatically.
Organizing and Sorting the Data
For our grading system to work properly, we need to sort the appended data by the number of points. This sorting is a crucial step because it allows us to arrange scores in ascending order, making it easier to assign appropriate grades.
Click on the column containing the points/scores and sort from A to Z (ascending). This action arranges all scores from lowest to highest, with the grade thresholds integrated among the student scores based on their point values.
Because of how the data is now arranged, the grade thresholds appear at positions where they naturally divide the student scores into different grade categories.
Assigning Grades Automatically
With our data properly sorted, we can now use a simple but powerful technique to assign grades to each student. The key to this process is the Fill Down command in Power Query.
The grade column initially has values only in the rows that came from our threshold table. To assign these grades to students, select the grade column and navigate to the Transform tab. Click on “Fill” and then “Down.” This action copies each grade down to all rows below it until another grade is encountered.
As a result, each student row is automatically assigned the grade corresponding to their score based on the defined thresholds. For example, if the threshold for grade C is 40 points, students with 40 or more points (but below the next threshold) will be assigned a C.
Filtering and Finalizing the Results
After filling down the grades, we need to filter out the threshold rows to see only the student results:
Click the filter button on the column containing student names and select “Filter out empty values” or specifically select only the rows containing actual student data. This leaves us with just the student information and their corresponding grades.
With the data filtered and grades assigned, we can now load this information back to Excel. Navigate to the Home tab and select “Close & Load To.” Choose where you want to place the results in your workbook.
Updating Grades When Thresholds Change
One of the most powerful aspects of this approach is how easily you can adjust grading criteria:
If you need to change the grading thresholds, simply update the values in your original threshold table. For example, you might change the minimum score for an A from 90 to 88 points. After making these changes, save your file and then refresh the query (right-click on your results table and select “Refresh”).
The entire grading process will automatically run again with the new thresholds, updating all student grades accordingly. This makes the system extremely flexible and saves significant time when adjustments are needed.
Real-World Application Example
In my video demonstration, I showed how this system works with a sample dataset. For instance, Jennifer earned 13 points which resulted in an F grade, Robert received 34 points also resulting in an F, while Olivia scored 40 points which met the threshold for an E grade.
After changing the thresholds and refreshing the query, the grades updated automatically to reflect the new standards. This dynamic updating capability is what makes this approach so valuable for educators who may need to adjust grading scales based on class performance or other factors.
Power Query offers powerful solutions for cleaning data across columns by leveraging information contained in related columns. In this article, I’ll demonstrate how to efficiently remove cryptocurrency symbols from a name column when those symbols already exist in a separate column, even when the symbols have varying lengths.
This technique is particularly valuable when dealing with inconsistent data formats that require standardization for analysis.
Watch my step-by-step tutorial video to see these techniques in action:
Understanding the Data Cleaning Challenge
When working with financial data, especially cryptocurrency information, you’ll often encounter datasets where the same information appears redundantly across multiple columns. In my example, I have a table containing cryptocurrency data where the name column includes both the cryptocurrency name and its symbol (like “Bitcoin BTC”), while a separate symbol column already contains just the symbol (“BTC”).
The challenge lies in cleaning the name column by removing these redundant symbols. What makes this particularly tricky is that cryptocurrency symbols vary in length — some might be three characters (BTC), others four (DOGE) or more. This inconsistency means we can’t simply remove a fixed number of characters from the end of each name.
Preparing the Data in Power Query
To begin the cleaning process, we need to load our data into Power Query where we can apply the necessary transformations:
Navigate to the Data tab in Excel
Select your data range and choose to load it to Power Query
Remove any unnecessary columns to focus on the task at hand
In my case, I initially removed several columns to focus on just the key columns for this demonstration — the name column containing both cryptocurrency names and symbols, and the symbol column containing just the symbols. This simplified approach allows us to concentrate specifically on the cleaning technique without distraction.
Solution #1: Using Text After Delimiter
The first solution uses Power Query’s built-in functions to extract the relevant part of the text. This approach is more straightforward and uses the UI-based options:
To implement this solution:
Select the name column in Power Query
Go to the Add Column tab
Choose Extract and then Text After Delimiter
Initially, when using the UI, you might try entering a specific symbol like “BTC” as the delimiter. However, this would only work for one cryptocurrency. The real power comes from modifying the formula to use the symbol column as a reference.
After creating the initial formula, you’ll need to edit it in the formula bar to make it dynamic. Replace the hardcoded delimiter text with a reference to the symbol column. This way, Power Query will use each row’s corresponding symbol to extract just the cryptocurrency name, regardless of the symbol’s length.
Solution #2: Using Text.RemoveRange Function
The second solution is more complex but demonstrates the flexibility of Power Query’s M language. For this approach:
Go to the Add Column tab
Select Custom Column
Use the Text.RemoveRange function to precisely remove the symbol part
This function works by removing a specific range of characters from the text. It requires three parameters:
The source text (our name column)
The starting position (0‑based in Power Query)
The number of characters to remove (which we determine using the length of the symbol)
Remember that Power Query uses zero-based indexing, meaning the first character position is 0, not 1. This is crucial when working with text manipulation functions.
Finalizing the Transformation
After applying either solution, we need to clean up our query:
Rename the newly created column to something meaningful (like “name”)
Delete the original name column that contained the symbols
If needed, restore any previously removed columns by deleting the “Removed Columns” step
Arrange columns in the desired order by dragging them
In my example, I also demonstrated how to clean another column (circulating supply) that had a similar issue but with a simpler pattern. Since all the unwanted symbols in this column were preceded by a space, I could use the Extract Text Before Delimiter function with a space as the delimiter.
Loading the Cleaned Data Back to Excel
Once all transformations are complete, we can load the cleaned data back to Excel:
Go to the Home tab in Power Query
Select Close & Load To
Choose whether to load to a new worksheet or an existing location
In my demonstration, I loaded the data to cell A13 on the existing worksheet, which allows for easy comparison between the original and cleaned data.
Why This Approach Matters
The beauty of using Power Query for this type of data cleaning is that it creates a repeatable process. If your data source updates, you can simply refresh the query, and all the cleaning steps will be reapplied automatically. This saves significant time compared to manual cleaning or using formulas that would need to be copied down for each new set of data.
Additionally, by leveraging information from one column to clean another, we create a more robust solution that can handle variations in the data format without breaking. This approach is far superior to hard-coding specific text values or character counts that might change across different datasets.