Comparing TWKB with compressed geoJSON

September 8th, 2013 by Nicklas Avén

One question about TWKB is if there is any gain compared to just compressing geoJSON. That question is worth some investigation. In php for instance you can compress all data on the fly before sending it to the client. Then the browser will decompress the message and use it as normal. This works really good and fast. So, the question is if it is worth the effort to build a twkb-geometry.

I have three reasons why it is.

  1. twkb seems to actually be smaller than compressed geoJSON.
  2. Compressing takes cpu time and power
  3. If you take the geometry directly from the database, there will be a lot more data to send from the database to the web server.

To demonstrate this I have published a web page http://178.79.156.122/twkb_test.
It is a little bit messy so I will guide you.

I find the dev tools in Google Chrome nice to compare the sizes and timing of the compressed data.

First press the button “Get available Layers”. That will send a websocket message to the server to return what layers we can use. Then you should get a list in the list box “Choose layer”.

  1. Ok, now choose “Areal Types” (or whatever you want, but just to follow my numbers).

Now you have 4 buttons to choose from. See the upper most row on the picture above. The two buttons to the left will both give you twkb geometries. The websocket button to the left sends twkb “as is” uncompressed, and the second button sends the twkb geometries compressed through php. To the right you get the corresponding buttons for geoJSON.

Behind the twkb-buttons is a query against the database that looks like this:

SELECT ST_AsTWKB(geom,5) FROM prep.n2000_arealtyper;

and the geoJSON query looks like this:

SELECT ST_AsgeoJSON(geom,5) FROM prep.n2000_arealtyper;

So, we are querying the same data in both cases and asks for 5 decimals in both cases. The SRID of the table is 4326, so it is lat lon coordinates. that is why 5 decimals makes sense.

Comparing the web socket and php implementations in timing makes no sense. There is too many unknown in that. But it is interesting to compare the sizes. To see how much it gets compressed.

The sizes of the uncompressed gets displayed in the table under the button. But I have found no way to get the compressed size in javascript. But in Coogle Chrome dev tools you can see it under the Network tab.

Then you see that the compressed version of areal types in geoJSON is 1.2 mb. If you test the uncompressed version of TWKB you will see that it is about 720 kb. geoJSON compresses from 3.6 mb so it gets compressed quite a lot to 1.2 mb. TWKB only compresses to 660 kb from 720 kb so geoJSON compresses much better. But anyway, uncompressed twkb is quite a lot smaller than compressed geoJSON.

The differences will vary depending on dataset and number of decimals, but it seems like TWKB gets smaller all over.

The two second arguments I had in the beginning, why twkb is a better choice is about the server load. I have no good way in investigating what takes time on the server, but how long we have to wait before we get any data from the server says something about the work the server needs to do. In the network tab in the chrome dev tools you can hold the mouse over the bar to the left showing the time spent on getting the data. Then you will get the timing divided in “waiting” and “receiving” like in the picture below.

The first thing you can note is that my connection is quite slow. Getting 1.2 mb takes about 7 seconds today. So, reducing size of content to send over internet is still important, even if you as a developer is sitting on a Gb line. What you also can see, and is more important is that you have to wait 1.37 seconds before you start to get anything back, no matter how fancy internet connection you have. If you do the same thing with twkb, by asking for compressed twkb data you will probably get a timing around 0.8 seconds. So there is a difference of at least 0.5 seconds.

in both cases all the data gets sent from the database to php before anything gets sent. So what we compare is how long it takes for twkb vs geoJSON to:

  1. Be read from disc and twkb vs geoJSON gets constructed in the database
  2. The data sent from the database to php
  3. Php builds the page
  4. It gets compressed

In all those steps 3.6 mb needs to be handled in the geoJSON case compared to 720 kb in the TWKB case. In the TWKB case the size gets already when the twkb geometry gets constructed.

Maybe 0.5 seconds doesn’t sounds that much. But this load is not only affecting me sitting waiting for a map showing up. It affects resources shared by anyone asking for a map from the same server.

You can play around with the different data sets and compare timing and sizes of twkb vs geojson and compressed vs uncompressed.

Some TWKB updates

September 8th, 2013 by Nicklas Avén

Last week I gave my first talk about TWKB. One good thing about presenting what you are doing is that it makes you actually do it.

So, what is new?

  • TWKB is in PostGIS trunk! That means it will be included in PostGIS 2.2.
  • ID is optional in the spec. In the PostGIS implementation that means that if you don’t ad an ID there will be no space occupied for an ID. A minimal point is then only 4 bytes.
  • The one and only serialization method is VarInt, the same way as integers gets serialized in proto buffers.
  • I have added a quite generic javascript example of how to read twkb into a geoJSON object
  • The demo page http://178.79.156.122/twkb_test is refreshed and php-examples is added to see the effect of compressing geoJSON compared to twkb. More about that in a separate blog post

Call for brain power

June 8th, 2013 by Nicklas Avén

I believe that TWKB can be something good. It can be a very fast format for moving geometries around.

If TWKB is going to be something more than a few demos like here and here, more brain power is needed.

My vision for TWKB is thatin 2013 it will:

  • be supported in Leaflet and OpenLayers 3
  • be in the trunk for relaese in PostGIS 2.2
  • be supported by OGR
  • have more than 5 contributors to the specification

What I have so far with TWKB is collected here. It is a github, divided in 3 parts.

First part is the specification.

Second part is the PostGIS implementation of the spec (Type 1 to 24 is implemenetd)

The last part is the web related scripts, like webserver and client-implementation

All this above is just meant as something to start the discussion from. The goal is to find a very efficient and flexible binary format for geometries.

For me myself I also hope that someone will hire or employ me so I can work with things like this on day time :-) I have a lot of ideas I would like to test.

TWKB specification

June 8th, 2013 by Nicklas Avén

To make it possible to discuss what TWKB can be  and what it should look like I have put a first draft of a specification here

TWKB aggregates

June 8th, 2013 by Nicklas Avén

TWKB (Tiny WKB) can be aggregated and nested. The result is a special type. Since the TWKB-geometry holds it’s own ID (like many text-based gis-formats) , the result of an aggregation of many TWKB-geometries also nests the ID’s into the new aggregated TWKB geometry.

This gives us possibilities like creating a type of vector tiles on the fly. I have tried to demonstrate it here, but I didn’t get it as visual as I had hoped.

Those types is described as type 21-24 in the first draft of TWKB-specification.

A few questions and answers about the last posts

May 6th, 2013 by Nicklas Avén

I have had a few questions about TWKB and the websocket DEMO

When does the geoemtries actually load?
It is not obvious when the geoemtries actually gets loaded from the server to the browser.

That happens first time you click a layer. Then the geometries are streamed row by row via websocket to the browser which parses the geometry and adds it to the map, also geometry by geometry. Then if you switch off a layer and back again it is just loaded from Leaflet internally.

Can TWKB handle more than 2 dimmensions
Yes, TWKB can handle up to 7 dimmensions.

Can this websocket approach be used for writing back to the database?
Yes that is easy to implement. Just send the geometry back to nodejs with ws.send(), and insert it to the database. There is no function to import from TWKB into PostGIS. That is no big thing to write, but I don’t think there is the same performance need when posting back to the database, since that will be one or two geometries, not thousands of them. So the easiest is to just send it back as WKT and use ST_Geomfromtext to get it in the database.

Mapservice from Websocket with TWKB

May 5th, 2013 by Nicklas Avén

For those who don’t know what I am talking about TWKB is a compressed binary export format from PostGIS described here, and here.

It is just in the experimental stadium. The source for the PostGIS part can be found here

The Mapservice
What I find maybe most interesting is the websocket thing. I haven’t played with that before. Maybe this is old news for all of you out there. But websockets works cross-domain. So, a websocket can be approached from a page on my webserver or from a page on your desktop. That makes no difference.

You can download the index.htm from the demo:
The DEMO
put it someone on your computer, open it with a browser and it should load the maps.

You can also put this in a javascript:

var ws = new WebSocket('ws://178.79.156.122:8088');
ws.send(JSON.stringify({"nr":nr,"map_name":map_name}));

and you should get a reply if you use one of the map names that my “service” has.

If you don’t know the map names you can send:

ws.send(JSON.stringify({"request":"getcapabilities"}));

and you will get back a json-object with some metadata (That is demonstrated in the demo too)

All this is very unfinished, but it shows the idea.

Test it in OpenLayers too
The demo I have written is in Leaflet. That is just because it seemed easier to get started to test this in Leaflet. But it would be very interesting if someone took TWKB for a ride with OpenLayers (3). Since OpenLayers 3 promises webGL support a compressed binary format ought to be interesting.

What parameters the websocket takes
This is not even tested all of it, but here is what you can send to the wesocket:

map_name
srid (if not passed the srid of the table will be used, which can be found as default_srid in with getcapabilities)
precision how many decimals the coordinates shall have. Consider what unit your srid has. If not set the default value will be used
center.x & center.y Those coordinates gives the point that the result is ordered from. The idea is to get it order from the middle of the map, which makes sence if you are zoomed in and don’t see the whole map
inverted_lat_lng, boolean

As showed here it can be used directly from the websocket. Then there is not even any compiling involved.

I will, as I have said come up with a post about the technical aspects of the format, but I am afraid that will take some time. Meanwhile I will gladly answer any questions to make it easier getting started.

Nodejs, Websockets and TWKB

April 29th, 2013 by Nicklas Avén

A short update about twkb, described in this earlier post

I have been doing some testing, sending the geometries from PostGIS to the client as twkb through a websocket.

This is the first time I have been playing with nodejs and websockets. It is really nice things.

Here is the demo:

http://178.79.156.122/twkb_node/

Wait till the page is properly loaded, and click “Municipalities” or “Areal types”. then you should see the geometries start showing. It should start showing in the middle of the map and going outwards. The neat thing about that is that when you are zoomed in at any place in the map and click “Areal types” for instance, you will almost at once get the geometries where you are zoomed in.

But if you click Municipalities before the “Areal types” are finished, you will have to wait a few seconds. That is because all geometries of the first layer is already queued at the client, and I haven’t found any way to manipulate that queue.

To get the stream ordered by distance to the center of the map is only possible because the geometries is taken directly from the database.
The query for the Municipalities layer looks something like this:

SELECT kommunenr, ST_Astwkb(geom,3,kommunenr,'NDR') geom
FROM kom_geom
ORDER BY ST_Setsrid(ST_Point($1,$2),4326) geom;

where $1 and $2 is lat long from the center of the map.

I think it works pretty fast. Municipalities layer has 435 geometries and 149363 vertex-points in total, and the Areal types layer has 3553 geometries and 179025 vertex-points.

Tiny WKB

April 9th, 2013 by Nicklas Avén

Lately I have spent some time on a compressed binary output format from PostGIS. It is so far just some sort of “proof of concept”.

The idea is a binary format with some of the features found in common text-based gis formats. The main features is:

  1. Controllable precision (number of decimals)
  2. Relative coordinates
  3. The ID of the geoemtry is stored in the geoometry

I have called the format Tiny WKB since the wkb-format was the closest I know of. But probably the name should be something different since wkb means “well known binary” and this is not “well known”. But Tiny WKB or twkb will have to do for now. The function in PostGIS to create a twkb geometry is I have called ST_ASTWKB, ST_ASTWKB(geometry, precision, ID, endianess).
The sourse code for the TWKB creator is found in http://svn.osgeo.org/postgis/spike/nicklas/twkb.
Just get it with subversion and compile as usual with PostGIS.

So, let’s take a look at the advertised features:

Control over number of decimals

The geometries stored in the database have far more precision than needed for presentation purposes. Often when showing a map on the web, a precision more than one meter is overkill, even when zoomed in. So, if you are using a meter based projection and you want just full meter precision you set the precision parameter to 0. If you want 10 meters precision you set precision to -1.

Relative coordinates

For instance a line like
‘LINESTRING(352400 6752414, 352415 6752418, 352452 6752402)’
, with relative coordinates looks like this:
‘LINESTRING(352400 6752414, 15 4, 37 -16)’

The more coordinates in a geometry the more space we save by using relative coordinates. This is used in formats like SVG and TopoJSON. It gets more complicated when dealing with a binary format since there is no separator between the numbers. In wkb-format for instance that is no problem since all numbers uses the same number of bytes or bits. The reader just counts the bits and knows where the number stops. But then we would gain nothing from our relative coordinates. So twkb handles 3 different storage sizes of the coordinates 1, 2 or 4 bytes.

Storage of the geometry ID  inside the geometry

In the header of the geometry there is a 4 byte integer for storing an ID of the geometry. This gives some possibilities. For instance we can write an aggregating variant of ST_ASTWKB. Then we can aggregate the twkb geometries to geometry collections grouped by intersection with a grid. Then we get vector-tiles directly from the database with ID inside the tile to each single geometry. So at client side each geometry can be identified and joined to it’s attribute data.

The ID should also make it easier to implement support for typologies. All edges can be sent separately with included ID.

OK, so how small does it get

To give some numbers in bytes:

geometry WKB TWKB incl 4 bytes of ID
POINT(1 1) 21 14
LINESTRING(1 1, 10 15) 41 20
LINESTRING(1 1, 10 15, 22 30) 57 22

Ok, you get the point. Bigger geometries gains more from twkb than smaller. But the gain gets smaller if the need of precision is higher. If we take the last example and wants to store 3 decimals the difference is smaller. Also if the distance between the coordinates increases we need more space to store the relative coordinates. There is also an overhead in changes of sizes. So, to make it extreme:

geometry WKB TWKB incl 4 bytes of ID
LINESTRING(1 1, 1000000 15, 1010 30) 57 36

TWKB is still quite a lot smaller than WKB but the difference is smaller.

But how fast is it?

That is not easy to answer. It takes some overhead to create the TWKB since the geometry have to be analyzed in the database and each coordinate calculated, not just copied. But that overhead seems to disappear in the gain of decreased IO.

I have a layer of all the roads in Norway. Some stats of the layer:
Number of linestrings: 1224248
Total number of vertex points: 23485321

To just check the cost of creating WKB vs TWKB we can do like this (and get the total size as bonus):
SELECT SUM(LENGTH(ST_ASBinary(geom))) FROM veger;
that takes on this machine 979 ms

The corresponding query for TWKB looks like this:
SELECT SUM(LENGTH(ST_ASTWKB(geom,0,gid,’NDR’))) FROM veger;
and takes 2770 ms

So we have an almost 2 seconds overhead.

But if we instead of just checking the size of the result actually puts the result in a table:

CREATE TABLE wkb_veger as
SELECT ST_ASBinary(geom) geom FROM veger;

takes 6437 ms

and

CREATE TABLE wkb_veger as
SELECT ST_ASTWKB(geom,0,gid,’NDR’) geom FROM veger;

takes 3680 ms

So the smaller size even in internal handling in the database eats up the overhead.

To be fair we should mention that we reduce the number of decimals to 0. But actually the original layer had no more than 1 decimal precision even if there is a lot of trailing zeros. So if we create TWKB with 1 decimal instead it takes 4346 ms.

The geometries as WKB uses 368 mb

If stored as TWKB with 0 decimals it uses 63 mb, with 1 decimal 79 mb and with 2 decimals (1 trailing zero), 108 mb. Don’t forget that includes 4 bytes of ID to each geometry.

So, the database is quite fast in handling TWKB. But for web-mapping, how about php and javascript?

The DEMO

I think there is quite a lot of optimization to do in my demo. Maybe NodeJS is faster than php for getting the binary data out for example. Also the javascript TWKB reader I have written probably suffers from bad coding.

But it works, and my hope is that other people finds this interesting enough to build interesting clients. It would for instance be very interesting to see how QGIS would react on more slimmed geometries. I think it would give new possibilities to get faster rendering.

The demo can be found here:

http://179.78.156.122/twkb

It is a Leaflet map. To turn on the TWKB layers check the check boxes in the bottom of the page. I have tested in Chrome and Firefox. As you can see from the timer that appears after a TWKB layer is loaded there is several bottlenecks. I think php seems to work quite slow here and also the addition of the geometries in leaflet. The reading of the TWKB geometries (parsing) is just a very small part of the time it takes. The layers is stored in srid 4326. The first Municipalities layer has 3 decimals and Municipalities HD has 5 decimals. You can see the difference when zooming close. The “Areal Types layer” also has 5 decimals.

Summary

Now this is just a “prof of concept” in my sandbox in PostGIS resporitory.
If this sounds interesting give some feedback what is needed to make something good out of it. If you have the possibility it would be very valuable with a better and more sophisticated client. As mentioned QGIS rendering TWKB would be very interesting. If there is an interest I will write a new post describing the technical aspects of the format.

If the interest is big enough it might go into PostGIS some time :-)

The future of PostGISonline.org

December 5th, 2012 by Nicklas Avén

Is there any interest from someone to support or take over the site postgisonline.org?

Background:

About 3 years ago I started postgisonline.org. It is a site that aims to help people, finding the beauty of spatial sql.

It still has between 300 and 600 unique visits per month and many of the visitors actually looks around on the site and stays for a while. I think that is quite good for a site that has not been refreshed for a few years and doesn’t contain any nudity.

The site runs at a linode virtual server which costs me 20$ a month. I restarted the server the other day and found that it had been up for more than 2 years. That means there is a lot of maintenance that should be done. Most important move it to an os-version that is not obsolete.

My problem now is that I do not have the amount of spare time that I had by the time setting this site up and the 20 $ per month hurts a little.

So, I can see a few options:

1) I close it down in March when the prepaid period at linode is over (say no, say no)

2) Someone likes the concept and wants to take over. (To develop it in any direction but free to use for anybody)

3) Someone want to corporate with me to get things happen

4) Someone wants to pay the cost for hosting (or has some other hosting option) and I will try to do the necessary maintenance

You can give a reply here or write a line to: nicklas.aven@jordogskog.no

Also all feedback is welcome in the decision of closing down or continue and in what direction if continuing.