Power Query offers powerful solutions for cleaning data across columns by leveraging information contained in related columns. In this article, I’ll demonstrate how to efficiently remove cryptocurrency symbols from a name column when those symbols already exist in a separate column, even when the symbols have varying lengths.
This technique is particularly valuable when dealing with inconsistent data formats that require standardization for analysis.
Watch my step-by-step tutorial video to see these techniques in action:
Understanding the Data Cleaning Challenge
When working with financial data, especially cryptocurrency information, you’ll often encounter datasets where the same information appears redundantly across multiple columns. In my example, I have a table containing cryptocurrency data where the name column includes both the cryptocurrency name and its symbol (like “Bitcoin BTC”), while a separate symbol column already contains just the symbol (“BTC”).
The challenge lies in cleaning the name column by removing these redundant symbols. What makes this particularly tricky is that cryptocurrency symbols vary in length — some might be three characters (BTC), others four (DOGE) or more. This inconsistency means we can’t simply remove a fixed number of characters from the end of each name.
Preparing the Data in Power Query
To begin the cleaning process, we need to load our data into Power Query where we can apply the necessary transformations:
- Navigate to the Data tab in Excel
- Select your data range and choose to load it to Power Query
- Remove any unnecessary columns to focus on the task at hand
In my case, I initially removed several columns to focus on just the key columns for this demonstration — the name column containing both cryptocurrency names and symbols, and the symbol column containing just the symbols. This simplified approach allows us to concentrate specifically on the cleaning technique without distraction.
Solution #1: Using Text After Delimiter
The first solution uses Power Query’s built-in functions to extract the relevant part of the text. This approach is more straightforward and uses the UI-based options:
To implement this solution:
- Select the name column in Power Query
- Go to the Add Column tab
- Choose Extract and then Text After Delimiter
Initially, when using the UI, you might try entering a specific symbol like “BTC” as the delimiter. However, this would only work for one cryptocurrency. The real power comes from modifying the formula to use the symbol column as a reference.
After creating the initial formula, you’ll need to edit it in the formula bar to make it dynamic. Replace the hardcoded delimiter text with a reference to the symbol column. This way, Power Query will use each row’s corresponding symbol to extract just the cryptocurrency name, regardless of the symbol’s length.
Solution #2: Using Text.RemoveRange Function
The second solution is more complex but demonstrates the flexibility of Power Query’s M language. For this approach:
- Go to the Add Column tab
- Select Custom Column
- Use the Text.RemoveRange function to precisely remove the symbol part
The formula will look something like:
Text.RemoveRange([name], 0, Text.Length([symbol]))
This function works by removing a specific range of characters from the text. It requires three parameters:
- The source text (our name column)
- The starting position (0‑based in Power Query)
- The number of characters to remove (which we determine using the length of the symbol)
Remember that Power Query uses zero-based indexing, meaning the first character position is 0, not 1. This is crucial when working with text manipulation functions.
Finalizing the Transformation
After applying either solution, we need to clean up our query:
- Rename the newly created column to something meaningful (like “name”)
- Delete the original name column that contained the symbols
- If needed, restore any previously removed columns by deleting the “Removed Columns” step
- Arrange columns in the desired order by dragging them
In my example, I also demonstrated how to clean another column (circulating supply) that had a similar issue but with a simpler pattern. Since all the unwanted symbols in this column were preceded by a space, I could use the Extract Text Before Delimiter function with a space as the delimiter.
Loading the Cleaned Data Back to Excel
Once all transformations are complete, we can load the cleaned data back to Excel:
- Go to the Home tab in Power Query
- Select Close & Load To
- Choose whether to load to a new worksheet or an existing location
In my demonstration, I loaded the data to cell A13 on the existing worksheet, which allows for easy comparison between the original and cleaned data.
Why This Approach Matters
The beauty of using Power Query for this type of data cleaning is that it creates a repeatable process. If your data source updates, you can simply refresh the query, and all the cleaning steps will be reapplied automatically. This saves significant time compared to manual cleaning or using formulas that would need to be copied down for each new set of data.
Additionally, by leveraging information from one column to clean another, we create a more robust solution that can handle variations in the data format without breaking. This approach is far superior to hard-coding specific text values or character counts that might change across different datasets.