How to Remove Duplicate Rows in Calc [OOo, LibreOffice & NeoOffice]

There is no automatic function to remove duplicate rows in Calc. In order to remove duplicate rows you need to perform some comparison tests. This tutorial will show you how to

  • sort the data
  • compare the data
  • remove the duplicate rows

There is a tutorial here that will show you how to check for duplicates on a single column value, however often we need to compare for more than one value in a row.

Firstly open your spreadsheet and determine the values that you are going to test. In our sample spreadsheet, I am testing for duplicates that match the OrderDate, Region and Rep fields.

Select & Sort the Cells

Select all cells of the current data range, and then choose Data >> Sort.

Select the column names for the data you wish to compare. For example, I am testing to see if OrderDate, Region & Rep are replicated in this spreadsheet. I will sort according to these columns in ascending order.

Compare the Data

Click an empty cell in the first row. In this example it’s H2.

Enter the formula:
=IF(A2=A3;1;0)

This will display 1 if the current row has the same value in column A as the next row. It will display 0 if the values are different.

I labelled this column ‘Test A’ (as we are testing the values in column A).

Now because I want to also test the values in columns B and C, then I need to do similar comparisons.

So, in cell I2 I enter the formula:
=IF(B2=B3;1;0)
and I labelled this column ‘Test B’

And in cell J2 I enter the formula:
=IF(C2=C3;1;0)
and I labelled this column ‘Test C’.

 Copy the formula from H2 to J2 down for all rows of the data range.

Freeze the Contents of Test Columns

Now the formulas must be replaced by their values to freeze the contents. To do this we highlight all the cells that have the test values in them. In this example, I have test values in columns, H, I and J, and so I highlight the cells from H2 to J48.

While these are highlighted, I then press Ctrl+C (or CMD+C for Macs) to copy all selected cells to the clipboard.

I then select Edit >> Paste Special… to open the ‘Paste Special’ dialog box.

In the Selection area, enable only the Numbers command (and Text – sometimes it works better if you enable both – see above image). Disable the other Selection commands. Click OK.

You may be prompted to confirm that you are overwriting existing data – click ‘Yes’.

Now in column K, I am going to add the three test values together. If all three are true (1), then we have a duplicate row.

So, in cell K2 I enter the formula:
=SUM(H2:J2)
And I label the column ‘Sum Tests’

And I copy the formula from K2 into all the other rows in column K.

Sort and Remove Data

I then select the whole data range including the new columns H, I, J & K and sort the range by column K (my ‘Sum Tests’ column). I usually sort this in a descending order.

You will notice in the ‘Sum Tests’ columns that the higher values will appear at the top. In this case, as we were testing for 3 criteria for our  duplicates, if a duplicate exists, then the value in this column will be 3. All ‘3’ values represent a duplicate value. If you only test for 2 values, then returning a ‘2’ in the ‘Sum Test’ column will indicate a duplicate.

In this example, I will now delete all the rows that return a ‘3’ in my ‘Sum Test’ column.

Tidying Up

Obviously I don’t want to keep my test columns (columns H, I, J & K in the above image), so I can now delete these as well.

Comments

  1. Thx for your examples, but how to delete dublicate rows if I have only one column?
    ex.gr:
    Ontario
    Toronto
    Vancouver
    Ontario

    So I need only one Ontario, please help. Thx

  2. thank you..

  3. For a simple approach to removing duplicates you could:
    1. Use ‘sort ascending’ then add an extra column (you can remove it later)
    2. Add the branching statement (IF) as mentioned in previous posts eg: =IF(A2=A3,1,0)
    copy and paste the formula into all the cells in the new column. (left click and hold, then drag down)
    This will place a ‘0’ in the column where no match is found.

    3. Now add auto filter to the new column (the one with the 1 or zero) and filter for zero
    This will only show the those rows where no duplication has occurred.
    4. Copy all the rows and paste into new spreadsheet if desired.

    OR

    You could on step ‘3’ filter for 1 rather than zero and delete the results, this will delete the duplicate rows in your spreadsheet.

    I hope this helps someone, spreadsheets can be a real pain sometimes when you get stuck.

Speak Your Mind

*