Geopandas, The difference a Spatial Index Makes!

I was recreating the ArcPy Feature Vertices to Points tool using GeoPandas and Shapely and I was unimpressed with the length of time it took to perform against its ArcPy counterpart when it came to processing dangles. However, that was until I applied a spatial index to the workflow, which rapidly sped up matters. Rapidly being an understatement! Generating a spatial index with Geopandas sindex uses the rtree method. Apparently quad-tree is faster but I’m very impressed with the results here.

def check_if_dangle(point, gdf):
    """
    Return a count of the linear features a point intersects, if it is more than
    one then it is not a dangle, exactly one and it is a dangle

    args:
        point   the point to assess if its a dangle or not
        gdf     the entire GeoDataFrame to assess the point against

    return
        count (int) the number of linear features a point intersects

    """

    ## Create a buffered bounding box around the point with the specified tolerance (set here as 0.01)
    buffered_bbox = point.buffer(0.01).bounds

    ## Convert the buffered bounding box to a Polygon
    buffered_bbox_polygon = box(*buffered_bbox)

    ## initiate count, which is how many lines the buffered_bbox_polygon intersects
    ## more than one and it is not a dangle
    count = 0

    ## for each row (linear feature in the dataset)
    for index, row, in gdf.iterrows():
        ## get the geometry of the line
        line = row["geometry"]
        ## if the buffered_bbox_polygon intersects the line, increase the count by 1
        if buffered_bbox_polygon.intersects(line):
            count += 1

    return count

This took 6 minutes and 8 seconds to complete on a dataset that contained 97 records with mulitpart geometries that broke out to 1555 parts.

Introducing the Spatial Index

## spatial index to rapidly speed up processing of larger datasets
spatial_index = gdf.sindex

def check_if_dangle(point, gdf):
    """
    Return the count of the linear features a point intersects, if it is more than
    one then it is not a dangle, exactly one and it is a dangle

    args:
        point   the point to assess if its a dangle or not
        gdf     the entire GeoDataFrame to assess the point against

    return
        count (int) the number of linear features a point intersects

    """

    ## Create a buffered bounding box around the point with the specified tolerance
    buffered_bbox = point.buffer(0.01).bounds

    ## Convert the buffered bounding box to a Polygon
    buffered_bbox_polygon = box(*buffered_bbox)

    ## Use the spatial index to efficiently find intersecting line features within the buffered bounding box
    possible_matches_index = list(spatial_index.intersection(buffered_bbox))

    ## initiate count, which is how many lines the buffered_bbox_polygon intersects
    ## more than one and it is not a dangle
    count = 0

    ## for possible match from the spatial index (for each index)
    for index in possible_matches_index:
        line = gdf.geometry.iloc[index]
        if buffered_bbox_polygon.intersects(line):
            count += 1

    return count

This only took 1.9 seconds!! to return the same output.

Leave a Comment

Your email address will not be published. Required fields are marked *