There is no automatic function to remove duplicate rows in Calc. In order to remove duplicate rows you need to perform some comparison tests. This tutorial will show you how to
- sort the data
- compare the data
- remove the duplicate rows
There is a tutorial here that will show you how to check for duplicates on a single column value, however often we need to compare for more than one value in a row.
Firstly open your spreadsheet and determine the values that you are going to test. In our sample spreadsheet, I am testing for duplicates that match the OrderDate, Region and Rep fields.
Select & Sort the Cells
Select all cells of the current data range, and then choose Data >> Sort.
Select the column names for the data you wish to compare. For example, I am testing to see if OrderDate, Region & Rep are replicated in this spreadsheet. I will sort according to these columns in ascending order.
Compare the Data
Click an empty cell in the first row. In this example it’s H2.
Enter the formula:
This will display 1 if the current row has the same value in column A as the next row. It will display 0 if the values are different.
I labelled this column ‘Test A’ (as we are testing the values in column A).
Now because I want to also test the values in columns B and C, then I need to do similar comparisons.
So, in cell I2 I enter the formula:
and I labelled this column ‘Test B’
And in cell J2 I enter the formula:
and I labelled this column ‘Test C’.
Copy the formula from H2 to J2 down for all rows of the data range.
Freeze the Contents of Test Columns
Now the formulas must be replaced by their values to freeze the contents. To do this we highlight all the cells that have the test values in them. In this example, I have test values in columns, H, I and J, and so I highlight the cells from H2 to J48.
While these are highlighted, I then press Ctrl+C (or CMD+C for Macs) to copy all selected cells to the clipboard.
I then select Edit >> Paste Special… to open the ‘Paste Special’ dialog box.
In the Selection area, enable only the Numbers command (and Text – sometimes it works better if you enable both – see above image). Disable the other Selection commands. Click OK.
You may be prompted to confirm that you are overwriting existing data – click ‘Yes’.
Now in column K, I am going to add the three test values together. If all three are true (1), then we have a duplicate row.
So, in cell K2 I enter the formula:
And I label the column ‘Sum Tests’
And I copy the formula from K2 into all the other rows in column K.
Sort and Remove Data
I then select the whole data range including the new columns H, I, J & K and sort the range by column K (my ‘Sum Tests’ column). I usually sort this in a descending order.
You will notice in the ‘Sum Tests’ columns that the higher values will appear at the top. In this case, as we were testing for 3 criteria for our duplicates, if a duplicate exists, then the value in this column will be 3. All ‘3’ values represent a duplicate value. If you only test for 2 values, then returning a ‘2’ in the ‘Sum Test’ column will indicate a duplicate.
In this example, I will now delete all the rows that return a ‘3’ in my ‘Sum Test’ column.
Obviously I don’t want to keep my test columns (columns H, I, J & K in the above image), so I can now delete these as well.