When I was working on the new functions described in previous post I found that the distance calculation in general is very heavy and slow. The distance function gets two geometries to find the shortest distance in between. The approach has been to calculate the distance between all possible combinations of vertex-vertex and vertex-edge between the two geometries. That means that two geometries with 1000 vertexes each causes one million iterations and even if computers are fast, that takes some time.
The ideas how to make it faster came to me by the time of the birth of my son. I guess you get some extra boost from something like that. I was home from job for 10 days to help my wife and son, and I did, I promise But I also had time to try some ideas of getting distance calculations faster. Because of this I call it Birgers Boost from my son Birger.
The idea was to find a way to not do this distance calculation between all and every vertexes. I thought that at least the ones behind the middle of the geometry must be possible to avoid. I imagined like a wall that I projected against the geometries and then I could sort the vertexes as they appear on the other side of the wall as I move it through the geometry. I guess it maybe doesn’t make sense but I thought it was a little fun to describe how the idea appeared. The resulting algorithm uses a line from the middle of the first geometry to the middle of the second geometry. Then it orders the vertexes along that line and calculates the distances in the order of how close they are along that line. The big difference from the old function is that the preparation here, giving the vertexes a value along this line only happens once per vertex. So in the example of 1000 vertexes per geometry it takes only 2000 calculations to get those values. Then, when the vertexes is ordered we can do the distance calculations in the right order. And when the distance between those abstract walls that I imagined is bigger than the smallest found distance, then we know that the shortest distance is found. How many distances we have to calculate before we know this will vary depending on how the geometries is related to each other.
From the testing we have done it seems like it in general gives a quite good increase in speed. For larger geometries it is between 10 and 100 times faster than the old algorithm. In some special cases it is not that fast and in some cases it is even faster.
This way of doing it will not work if the geometries overlap. The easiest way to be sure they don’t overlap is to check for overlapping bounding boxes. So, if there is overlapping bounding boxes the calculation is sent to the old hard way of doing it. The same is the situation if one of the geometries is a point because then there is no gain to get. Then it is done the same way as before
This is a problem but hopefully this will be solved. Paul Ramsey have come up with ideas that might make my way of doing it short lived, see his blog:
He is mostly discussing his new geography functions but probably it will be a good way of doing it for geometry too. So in PostGIS 2.0 the development will continue
Those distance calculations enhancements might be quite important because it makes it possible to calculate directly with the geometries in nearest neighbor calculations and thing like that instead of using the centroids. Using points will still be faster bu sometimes it may be useful to be able to run on the whole geometry and before it was often more or less impossible because of too heavy calculations.
This will be in PostGIS 1.5. A Beta release will hopefully be out soon. For windows there is experimental builds already available here:
And of course the source code is available to compile for other platforms.
I have wrote some lines in the wiki too, to describe this