I am working to plot a large GIS dataset that I have shown a previous sample of about 1/6 of the data. I am happy with how fast the data loads and
bokeh renders html almost instantly. However, I have come across a fairly active loop in my code that is not scaling well as I increase the 1) number of rows and 2) the resolution of the polygons. I just got killed in the
#count points loop and I wonder if there isn't a better way to do this?
I found the suggestion for a loop from a GIS readthedoc.io and was happy with its performance for a few thousand points a couple of months ago. But now the project needs to process a
GeoDataFrame with> 730000 rows. Am I supposed to use a better method to count the number of points in each polygon? I'm at a modern desk to do the calculation, but the project has access to Azure resources, maybe that's the majority of people who do this type of calculation professionally? I'd rather do the calculation locally, but it means my desktop might have to wait for maximum CPU cycles overnight or longer, which is not an exciting prospect. I am using Python 3.8.2 and Conda 4.3.2.
from shapely.geometry import Polygon import pysal.viz.mapclassify as mc import geopandas as gpd def count_points(main_df, geo_grid, levels=5): """ outputs a gdf of polygons with a columns of classifiers to be used for color mapping """ pts = gpd.GeoDataFrame(main_df("geometry")).copy() #counts points pts_in_polys = () for i, poly in geo_grid.iterrows(): pts_in_this_poly = () for j, pt in pts.iterrows(): if poly.geometry.contains(pt.geometry): pts_in_this_poly.append(pt.geometry) pts = pts.drop((j)) nums = len(pts_in_this_poly) pts_in_polys.append(nums) geo_grid('number of points') = gpd.GeoSeries(pts_in_polys) #Adds number of points in each polygon # Adds Quantiles column classifier = mc.Quantiles.make(k=levels) geo_grid("class") = geo_grid(("number of points")).apply(classifier) # Adds Polygon grid points to new geodataframe geo_grid("x") = geo_grid.apply(getPolyCoords, geom="geometry", coord_type="x", axis=1) geo_grid("y") = geo_grid.apply(getPolyCoords, geom="geometry", coord_type="y", axis=1) polygons = geo_grid.drop("geometry", axis=1).copy() return polygons