I was recreating the ArcPy Feature Vertices to Points tool using GeoPandas and Shapely and I was unimpressed with the length of time it took to perform against its ArcPy counterpart when it came to processing dangles. However, that was until I applied a spatial index to the workflow, which rapidly sped up matters. Rapidly being an understatement! Generating a spatial index with Geopandas sindex uses the rtree method. Apparently quad-tree is faster but I’m very impressed with the results here.
def check_if_dangle(point, gdf):
"""
Return a count of the linear features a point intersects, if it is more than
one then it is not a dangle, exactly one and it is a dangle
args:
point the point to assess if its a dangle or not
gdf the entire GeoDataFrame to assess the point against
return
count (int) the number of linear features a point intersects
"""
## Create a buffered bounding box around the point with the specified tolerance (set here as 0.01)
buffered_bbox = point.buffer(0.01).bounds
## Convert the buffered bounding box to a Polygon
buffered_bbox_polygon = box(*buffered_bbox)
## initiate count, which is how many lines the buffered_bbox_polygon intersects
## more than one and it is not a dangle
count = 0
## for each row (linear feature in the dataset)
for index, row, in gdf.iterrows():
## get the geometry of the line
line = row["geometry"]
## if the buffered_bbox_polygon intersects the line, increase the count by 1
if buffered_bbox_polygon.intersects(line):
count += 1
return count
This took 6 minutes and 8 seconds to complete on a dataset that contained 97 records with mulitpart geometries that broke out to 1555 parts.
Introducing the Spatial Index
## spatial index to rapidly speed up processing of larger datasets
spatial_index = gdf.sindex
def check_if_dangle(point, gdf):
"""
Return the count of the linear features a point intersects, if it is more than
one then it is not a dangle, exactly one and it is a dangle
args:
point the point to assess if its a dangle or not
gdf the entire GeoDataFrame to assess the point against
return
count (int) the number of linear features a point intersects
"""
## Create a buffered bounding box around the point with the specified tolerance
buffered_bbox = point.buffer(0.01).bounds
## Convert the buffered bounding box to a Polygon
buffered_bbox_polygon = box(*buffered_bbox)
## Use the spatial index to efficiently find intersecting line features within the buffered bounding box
possible_matches_index = list(spatial_index.intersection(buffered_bbox))
## initiate count, which is how many lines the buffered_bbox_polygon intersects
## more than one and it is not a dangle
count = 0
## for possible match from the spatial index (for each index)
for index in possible_matches_index:
line = gdf.geometry.iloc[index]
if buffered_bbox_polygon.intersects(line):
count += 1
return count
This only took 1.9 seconds!! to return the same output.