Tiny WKB

Lately I have spent some time on a compressed binary output format from PostGIS. It is so far just some sort of “proof of concept”.

The idea is a binary format with some of the features found in common text-based gis formats. The main features is:

  1. Controllable precision (number of decimals)
  2. Relative coordinates
  3. The ID of the geoemtry is stored in the geoometry

I have called the format Tiny WKB since the wkb-format was the closest I know of. But probably the name should be something different since wkb means “well known binary” and this is not “well known”. But Tiny WKB or twkb will have to do for now. The function in PostGIS to create a twkb geometry is I have called ST_ASTWKB, ST_ASTWKB(geometry, precision, ID, endianess).
The sourse code for the TWKB creator is found in http://svn.osgeo.org/postgis/spike/nicklas/twkb.
Just get it with subversion and compile as usual with PostGIS.

So, let’s take a look at the advertised features:

Control over number of decimals

The geometries stored in the database have far more precision than needed for presentation purposes. Often when showing a map on the web, a precision more than one meter is overkill, even when zoomed in. So, if you are using a meter based projection and you want just full meter precision you set the precision parameter to 0. If you want 10 meters precision you set precision to -1.

Relative coordinates

For instance a line like
‘LINESTRING(352400 6752414, 352415 6752418, 352452 6752402)’
, with relative coordinates looks like this:
‘LINESTRING(352400 6752414, 15 4, 37 -16)’

The more coordinates in a geometry the more space we save by using relative coordinates. This is used in formats like SVG and TopoJSON. It gets more complicated when dealing with a binary format since there is no separator between the numbers. In wkb-format for instance that is no problem since all numbers uses the same number of bytes or bits. The reader just counts the bits and knows where the number stops. But then we would gain nothing from our relative coordinates. So twkb handles 3 different storage sizes of the coordinates 1, 2 or 4 bytes.

Storage of the geometry ID  inside the geometry

In the header of the geometry there is a 4 byte integer for storing an ID of the geometry. This gives some possibilities. For instance we can write an aggregating variant of ST_ASTWKB. Then we can aggregate the twkb geometries to geometry collections grouped by intersection with a grid. Then we get vector-tiles directly from the database with ID inside the tile to each single geometry. So at client side each geometry can be identified and joined to it’s attribute data.

The ID should also make it easier to implement support for typologies. All edges can be sent separately with included ID.

OK, so how small does it get

To give some numbers in bytes:

geometry WKB TWKB incl 4 bytes of ID
POINT(1 1) 21 14
LINESTRING(1 1, 10 15) 41 20
LINESTRING(1 1, 10 15, 22 30) 57 22

Ok, you get the point. Bigger geometries gains more from twkb than smaller. But the gain gets smaller if the need of precision is higher. If we take the last example and wants to store 3 decimals the difference is smaller. Also if the distance between the coordinates increases we need more space to store the relative coordinates. There is also an overhead in changes of sizes. So, to make it extreme:

geometry WKB TWKB incl 4 bytes of ID
LINESTRING(1 1, 1000000 15, 1010 30) 57 36

TWKB is still quite a lot smaller than WKB but the difference is smaller.

But how fast is it?

That is not easy to answer. It takes some overhead to create the TWKB since the geometry have to be analyzed in the database and each coordinate calculated, not just copied. But that overhead seems to disappear in the gain of decreased IO.

I have a layer of all the roads in Norway. Some stats of the layer:
Number of linestrings: 1224248
Total number of vertex points: 23485321

To just check the cost of creating WKB vs TWKB we can do like this (and get the total size as bonus):
SELECT SUM(LENGTH(ST_ASBinary(geom))) FROM veger;
that takes on this machine 979 ms

The corresponding query for TWKB looks like this:
SELECT SUM(LENGTH(ST_ASTWKB(geom,0,gid,’NDR’))) FROM veger;
and takes 2770 ms

So we have an almost 2 seconds overhead.

But if we instead of just checking the size of the result actually puts the result in a table:

CREATE TABLE wkb_veger as
SELECT ST_ASBinary(geom) geom FROM veger;

takes 6437 ms

and

CREATE TABLE wkb_veger as
SELECT ST_ASTWKB(geom,0,gid,’NDR’) geom FROM veger;

takes 3680 ms

So the smaller size even in internal handling in the database eats up the overhead.

To be fair we should mention that we reduce the number of decimals to 0. But actually the original layer had no more than 1 decimal precision even if there is a lot of trailing zeros. So if we create TWKB with 1 decimal instead it takes 4346 ms.

The geometries as WKB uses 368 mb

If stored as TWKB with 0 decimals it uses 63 mb, with 1 decimal 79 mb and with 2 decimals (1 trailing zero), 108 mb. Don’t forget that includes 4 bytes of ID to each geometry.

So, the database is quite fast in handling TWKB. But for web-mapping, how about php and javascript?

The DEMO

I think there is quite a lot of optimization to do in my demo. Maybe NodeJS is faster than php for getting the binary data out for example. Also the javascript TWKB reader I have written probably suffers from bad coding.

But it works, and my hope is that other people finds this interesting enough to build interesting clients. It would for instance be very interesting to see how QGIS would react on more slimmed geometries. I think it would give new possibilities to get faster rendering.

The demo can be found here:

http://179.78.156.122/twkb

It is a Leaflet map. To turn on the TWKB layers check the check boxes in the bottom of the page. I have tested in Chrome and Firefox. As you can see from the timer that appears after a TWKB layer is loaded there is several bottlenecks. I think php seems to work quite slow here and also the addition of the geometries in leaflet. The reading of the TWKB geometries (parsing) is just a very small part of the time it takes. The layers is stored in srid 4326. The first Municipalities layer has 3 decimals and Municipalities HD has 5 decimals. You can see the difference when zooming close. The “Areal Types layer” also has 5 decimals.

Summary

Now this is just a “prof of concept” in my sandbox in PostGIS resporitory.
If this sounds interesting give some feedback what is needed to make something good out of it. If you have the possibility it would be very valuable with a better and more sophisticated client. As mentioned QGIS rendering TWKB would be very interesting. If there is an interest I will write a new post describing the technical aspects of the format.

If the interest is big enough it might go into PostGIS some time :-)

Tags: , , ,

9 Responses to “Tiny WKB”

  1. Stephen Knox says:

    I just chanced upon this website and I wanted to say I do find this interesting. Any optimisations on an open binary GIS format have got to be a good thing IMHO.

    I would be interested in the technical aspects of how you achieved it, but as a part-time coder I can’t really offer a great deal of feedback I’m afraid.

    Keep up the good work!

  2. Nicklas Avén says:

    I will come up with a post about the technical aspects. I have tried two different variants of the format.

    But first I will try to get a node variant of the client working. Just to try to prove that it is worth the effort to serialize and deserialize from a binary format.

  3. [...] Nicklas Avéns blog Open Source (mostly) GIS in general and PostGIS in particular « Tiny WKB [...]

  4. [...] know what I am talking about TWKB is a compressed binary export format from PostGIS described here, and [...]

  5. Web Designer says:

    Its not fully complete i guess.

  6. Hmm it looks like your blog ate my first comment (it was extremely long) so I guess I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog.
    I as well am an aspiring blog blogger but I’m still new to everything.
    Do you have any helpful hints for first-time blog writers?
    I’d genuinely appreciate it.

  7. Its like you learn my mind! You appear to
    know so much approximately this, like you wrote the ebook in it or something.
    I think that you could do with some percent to pressure the message
    home a bit, but instead of that, this is fantastic blog. A great read.
    I will definitely be back.

    Feel free to surf to my web site Secret Lift Bottle

  8. Wow, superb blog layout! How long have you been blogging for?
    you make blogging loook easy. The overall look of
    yopur web site is fantastic, let alone thhe content!

  9. (1917- ) is one of the most influential Pentecostal leaders today.
    Most require a number of sessions, usually 6 to 15. Skin saving surgeries meant to reduce the amount of hair
    or pigmentation, reshape the face or reduce wrinkles or
    scarring are all very popular procedures and can
    make a world of difference in a person’s self-esteem.

Leave a Reply