Delete Identical Geometries based on OID with ArcPy & ArcGIS Pro

Why create this workflow...

Every week I put time towards answering questions on three main platforms; GIS StackExchange, Esri Communities, and Reddit. I go under the alias Clubdebambos on each of them. This week I helped g3odood on Reddit who was looking to remove duplicate records based on geometry, but needed to keep the records from the duplicate groups that had the lowest OID, while maintaining attributes from the original table. There is currently no out-of-box tool to achieve this. In the end, it was a mixture of ArcPy tools and functions mixed with some basic Python to create the custom tool that enables a user to keep the duplicate based on the highest (max) or lowest (min) OID within the duplicates. 

Creating The Python Script

As always, we start by importing arcpy.

				
					import arcpy
				
			

We require three user inputs; in_features (Type: Feature Layer), out_features (Type: Feature Class), and keep_based_on (Type: String)

				
					## input feature class
in_features = arcpy.GetParameterAsText(0)

## output feature class
out_features = arcpy.GetParameterAsText(1)

## keep duplicate based on min or max oid
keep_based_on = arcpy.GetParameterAsText(2)
				
			

We need to know what the OID field is from the in_features dataset. The OID field could be OBJECTID, FID, OID, or any other name, but a dataset will only ever have one field of type OID. 

Here, we also instantiate and empty dictionary. This dictionary will hold the duplicate group number from the FEAT_SEQ, field when we use the Find Identical tool, as the keys, and the minimum or maximum OID from each FEAT_SEQ (duplicate) group as the values.

				
					## the OID field is not always OBJECTID, this gets the OID field from the input
oid_fld = [fld.name for fld in arcpy.ListFields(in_features) if fld.type=="OID"][0]

## this dictionary will hold the min/max oid from the indentical groups returned
## from the FindIdentical tool (FEAT_SEQ)
keep_oids = {}
				
			

Now, we move onto the main workflows. We utilise the Find Indentical tool from the Data Management toolbox in ArcGIS Pro to create a table in our memory workspace (this prevents the need to write the table to disk). By supplying “SHAPE” as the fields parameter, we are looking for records that have duplicate geometry only.

				
					## call the find identical tool, this will return a table with the features
## grouped by OBJECTID (IN_FID) and FEAT_SEQ which is their group id.
## we use the memory workspace so the table doesnt need to be written to disk
identical = arcpy.management.FindIdentical(
    in_dataset = in_features,
    out_dataset = "memory\\out_fc",
    fields="SHAPE"
)
				
			

The above is indicative of the table returned from the Find Identical tool. The FEAT_SEQ field groups any duplicates found by a matching number and the IN_FID field is the original OID of the feature.

We use a Search Cursor to get a list of tuples containing IN_FID and FEAT_SEQ. Based on the example above thisn would be; [(1,1), (2,2), (3,3), (4,3), (5,4), (6,4), (7,5), (8,6), (9,7), (10,7)]

We then use some Python Magic to update our keep_oids dictionary, where the key is the unique FEAT_SEQ and the value is either the min or max OID relating to that FEAT_SEQ (duplicate group).

From our dictionary values, which represent or OIDs to keep, we create a list of strings – needed for using the Python string join function.

Lastly, we create a where clause SQL expression from our list of OIDs. This where clause will be used in the Export Features tool.

				
					## get a list of tuples representing (IN_FID, FEAT_SEQ)
data = [row for row in arcpy.da.SearchCursor(identical, ["IN_FID", "FEAT_SEQ"])]

## iterate through (IN_FID, FEAT_SEQ) tuples and store the minimum OID value
## per identical group found
if keep_based_on == "Minimum OID":
    for oid, group in data:
        if group not in keep_oids or oid < keep_oids[group]:
            keep_oids[group] = oid
elif keep_based_on == "Maximum OID":
    for oid, group in data:
        if group not in keep_oids or oid > keep_oids[group]:
            keep_oids[group] = oid

## convert list of OIDs to strings for use in join below
str_oids = [str(oid) for oid in keep_oids.values()]

## create a where clause for the export features tool
## update OBJECTID field name if necessary (courld be FID, OID etc)
where_clause = "{0} IN ({1})".format(oid_fld, ",".join(str_oids))
				
			

Our job is done and we can now export from the original in_feature using the Export Features tool from the Conversion toolbox and utilise the optional where clause parameter to subset the data based on whet we want to keep.

				
					## export from original dataset using the where clause to nsubset the data
arcpy.conversion.ExportFeatures(
    in_features = in_features,
    out_features = out_features,
    where_clause = where_clause
)
				
			

Don’t forget your sweeping-brush and clean-up the memory workspace.

				
					arcpy.management.Delete(identical)

				
			

Creating the tool in ArcGIS Pro

Save the script an open up ArcGIS Pro. Right-click on your toolbox/toolset of choice and select New > Script. The New Script window will appear. In the General tab set Name to deleteIdenticalGeomByOIDLabel to Delete Identical Geometry by OID, and the Description as per below or any description you feel is apt.

In the Parameters tab set as below. Set the Data Type for the Input Features to Feature Layer, and for the Output Feature Class to Feature Class (and the Direction to Output). Set the Data Type for Keep Based On to String and create a Filter Value List that contains Minimum OID and Maximum OID, you can also set the Default to Minimum OID.

In the Execution tab, click the folder icon in the top-right corner and add your saved Python script. 

Click OK and you are ready to take your new tool for a spin.

Full Python Script

				
					import arcpy

################################################################################
## Esri Documentation
##  https://pro.arcgis.com/en/pro-app/3.2/arcpy/functions/getparameterastext.htm
##  https://pro.arcgis.com/en/pro-app/3.2/arcpy/functions/listfields.htm
##  https://pro.arcgis.com/en/pro-app/3.2/tool-reference/data-management/find-identical.htm
##  https://pro.arcgis.com/en/pro-app/3.2/arcpy/data-access/searchcursor-class.htm
##  https://pro.arcgis.com/en/pro-app/3.2/tool-reference/conversion/export-features.htm
##  https://pro.arcgis.com/en/pro-app/3.2/tool-reference/data-management/delete.htm
##
## ArcGIS Pro Version: 3.2.0
##
################################################################################

################################################################################
## USER INPUTS #################################################################

## input feature class
in_features = arcpy.GetParameterAsText(0)

## out put feature class
out_features = arcpy.GetParameterAsText(1)

## keep duplicate based on min or max oid
keep_based_on = arcpy.GetParameterAsText(2)

################################################################################
## REQUIRED OBJECTS ############################################################

## the OID field is not always OBJECTID, this gets the OID field from the input
oid_fld = [fld.name for fld in arcpy.ListFields(in_features) if fld.type=="OID"][0]

## this dictionary will hold the min/max oid from the indentical groups returned
## from the FindIdentical tool (FEAT_SEQ)
keep_oids = {}

################################################################################
## CREATE IDENTICAL TABLE BASED ON GEOMETRY ####################################

## call the find identical tool, this will return a table with the features
## grouped by OBJECTID (IN_FID) and FEAT_SEQ which is their group id.
## we use the memory workspace so the table doesnt need to be written to disk
identical = arcpy.management.FindIdentical(
    in_dataset = in_features,
    out_dataset = "memory\\out_fc",
    fields="SHAPE"
)

################################################################################
## GET OIDS TO KEEP ############################################################

## get a list of tuples representing (IN_FID, FEAT_SEQ)
data = [row for row in arcpy.da.SearchCursor(identical, ["IN_FID", "FEAT_SEQ"])]

## iterate through (IN_FID, FEAT_SEQ) tuples and store the minimum OID value
## per identical group found
if keep_based_on == "Minimum OID":
    for oid, group in data:
        if group not in keep_oids or oid < keep_oids[group]:
            keep_oids[group] = oid
elif keep_based_on == "Maximum OID":
    for oid, group in data:
        if group not in keep_oids or oid > keep_oids[group]:
            keep_oids[group] = oid

## convert list of OIDs to strings for use in join below
str_oids = [str(oid) for oid in keep_oids.values()]

## create a where clause for the export features tool
## update OBJECTID field name if necessary (courld be FID, OID etc)
where_clause = "{0} IN ({1})".format(oid_fld, ",".join(str_oids))

################################################################################
## EXPORT DATA TO DISK #########################################################

## export from original dataset using the where clause to nsubset the data
arcpy.conversion.ExportFeatures(
    in_features = in_features,
    out_features = out_features,
    where_clause = where_clause
)

################################################################################
## CLEAN-UP MEMORY WORKSPACE ##################################################

arcpy.management.Delete(identical)

################################################################################

				
			

Leave a Comment

Your email address will not be published. Required fields are marked *