Lately I have spent some time on a compressed binary output format from PostGIS. It is so far just some sort of “proof of concept”.
The idea is a binary format with some of the features found in common text-based gis formats. The main features is:
- Controllable precision (number of decimals)
- Relative coordinates
- The ID of the geoemtry is stored in the geoometry
I have called the format Tiny WKB since the wkb-format was the closest I know of. But probably the name should be something different since wkb means “well known binary” and this is not “well known”. But Tiny WKB or twkb will have to do for now. The function in PostGIS to create a twkb geometry is I have called ST_ASTWKB, ST_ASTWKB(geometry, precision, ID, endianess).
The sourse code for the TWKB creator is found in http://svn.osgeo.org/postgis/spike/nicklas/twkb.
Just get it with subversion and compile as usual with PostGIS.
So, let’s take a look at the advertised features:
Control over number of decimals
The geometries stored in the database have far more precision than needed for presentation purposes. Often when showing a map on the web, a precision more than one meter is overkill, even when zoomed in. So, if you are using a meter based projection and you want just full meter precision you set the precision parameter to 0. If you want 10 meters precision you set precision to -1.
For instance a line like
‘LINESTRING(352400 6752414, 352415 6752418, 352452 6752402)’
, with relative coordinates looks like this:
‘LINESTRING(352400 6752414, 15 4, 37 -16)’
The more coordinates in a geometry the more space we save by using relative coordinates. This is used in formats like SVG and TopoJSON. It gets more complicated when dealing with a binary format since there is no separator between the numbers. In wkb-format for instance that is no problem since all numbers uses the same number of bytes or bits. The reader just counts the bits and knows where the number stops. But then we would gain nothing from our relative coordinates. So twkb handles 3 different storage sizes of the coordinates 1, 2 or 4 bytes.
Storage of the geometry ID inside the geometry
In the header of the geometry there is a 4 byte integer for storing an ID of the geometry. This gives some possibilities. For instance we can write an aggregating variant of ST_ASTWKB. Then we can aggregate the twkb geometries to geometry collections grouped by intersection with a grid. Then we get vector-tiles directly from the database with ID inside the tile to each single geometry. So at client side each geometry can be identified and joined to it’s attribute data.
The ID should also make it easier to implement support for typologies. All edges can be sent separately with included ID.
OK, so how small does it get
To give some numbers in bytes:
||TWKB incl 4 bytes of ID
|LINESTRING(1 1, 10 15)
|LINESTRING(1 1, 10 15, 22 30)
Ok, you get the point. Bigger geometries gains more from twkb than smaller. But the gain gets smaller if the need of precision is higher. If we take the last example and wants to store 3 decimals the difference is smaller. Also if the distance between the coordinates increases we need more space to store the relative coordinates. There is also an overhead in changes of sizes. So, to make it extreme:
||TWKB incl 4 bytes of ID
|LINESTRING(1 1, 1000000 15, 1010 30)
TWKB is still quite a lot smaller than WKB but the difference is smaller.
But how fast is it?
That is not easy to answer. It takes some overhead to create the TWKB since the geometry have to be analyzed in the database and each coordinate calculated, not just copied. But that overhead seems to disappear in the gain of decreased IO.
I have a layer of all the roads in Norway. Some stats of the layer:
Number of linestrings: 1224248
Total number of vertex points: 23485321
To just check the cost of creating WKB vs TWKB we can do like this (and get the total size as bonus):
SELECT SUM(LENGTH(ST_ASBinary(geom))) FROM veger;
that takes on this machine 979 ms
The corresponding query for TWKB looks like this:
SELECT SUM(LENGTH(ST_ASTWKB(geom,0,gid,’NDR’))) FROM veger;
and takes 2770 ms
So we have an almost 2 seconds overhead.
But if we instead of just checking the size of the result actually puts the result in a table:
CREATE TABLE wkb_veger as
SELECT ST_ASBinary(geom) geom FROM veger;
takes 6437 ms
CREATE TABLE wkb_veger as
SELECT ST_ASTWKB(geom,0,gid,’NDR’) geom FROM veger;
takes 3680 ms
So the smaller size even in internal handling in the database eats up the overhead.
To be fair we should mention that we reduce the number of decimals to 0. But actually the original layer had no more than 1 decimal precision even if there is a lot of trailing zeros. So if we create TWKB with 1 decimal instead it takes 4346 ms.
The geometries as WKB uses 368 mb
If stored as TWKB with 0 decimals it uses 63 mb, with 1 decimal 79 mb and with 2 decimals (1 trailing zero), 108 mb. Don’t forget that includes 4 bytes of ID to each geometry.
But it works, and my hope is that other people finds this interesting enough to build interesting clients. It would for instance be very interesting to see how QGIS would react on more slimmed geometries. I think it would give new possibilities to get faster rendering.
The demo can be found here:
It is a Leaflet map. To turn on the TWKB layers check the check boxes in the bottom of the page. I have tested in Chrome and Firefox. As you can see from the timer that appears after a TWKB layer is loaded there is several bottlenecks. I think php seems to work quite slow here and also the addition of the geometries in leaflet. The reading of the TWKB geometries (parsing) is just a very small part of the time it takes. The layers is stored in srid 4326. The first Municipalities layer has 3 decimals and Municipalities HD has 5 decimals. You can see the difference when zooming close. The “Areal Types layer” also has 5 decimals.
Now this is just a “prof of concept” in my sandbox in PostGIS resporitory.
If this sounds interesting give some feedback what is needed to make something good out of it. If you have the possibility it would be very valuable with a better and more sophisticated client. As mentioned QGIS rendering TWKB would be very interesting. If there is an interest I will write a new post describing the technical aspects of the format.
If the interest is big enough it might go into PostGIS some time