Open question to Mr Gates, Bill Gates

I saw, some months ago in news and talk shows that you were visiting the nordic countries, talking about your charity work.
I saw some of the interviews and I was left with some questions.

Let’s say, in some situation, you sit down with someone in a local administration for example in a country in southern Africa. This person asks you an honest question:

<em>”Mr Gates, with your background I guess you are the right person to ask; What IT-platform do you recommend our administration (or country) to invest in?
Should we go the Microsoft path or should we invest in an open source platform. Should we invest our money in the right to use Microsoft software or should we invest them in learning and developing systems built on open source solutions?”</em>

If, Mr Gates, you come to the same conclusion as I do, that they gain more in taking control over their IT-systems, investing in understanding and developing them, then I have a second question: Have you re-evaluated your view on software licensing when you now have had the opportunity to travel around to see the world from so many perspectives?

If your answer to the initial question is that they should invest in the Microsoft software suite using Windows, Office, Sql Server, Outlook, Explorer, sharepoint, msn etc etc then I am left with the feeling that your work of today looks more like a promotion gimmick . It would also give me a whole new perspective of cynism.

I know that Microsoft can make great deals to countries that cannot afford full price licenses. But that is not the point. That’s not the point at all actually. I made some searches to defend my point and I found it all written and explained in a very brilliant way from 2007 here:
http://freeknowledge.eu/documents/articles/free_milk

If you, Mr Gates have re-evaluated your view of software licensing and sees that free and open source software is a very important part of the foundation of a sustainable and fair world, then I will support your work fully. No one can of course oppose the intentions of your efforts to fight polio, colera etc, etc.

 

Best Regards

Nicklas Avén

Comparing TWKB with compressed geoJSON

One question about TWKB is if there is any gain compared to just compressing geoJSON. That question is worth some investigation. In php for instance you can compress all data on the fly before sending it to the client. Then the browser will decompress the message and use it as normal. This works really good and fast. So, the question is if it is worth the effort to build a twkb-geometry.

I have three reasons why it is.

  1. twkb seems to actually be smaller than compressed geoJSON.
  2. Compressing takes cpu time and power
  3. If you take the geometry directly from the database, there will be a lot more data to send from the database to the web server.

To demonstrate this I have published a web page http://178.79.156.122/twkb_test.
It is a little bit messy so I will guide you.

I find the dev tools in Google Chrome nice to compare the sizes and timing of the compressed data.

First press the button “Get available Layers”. That will send a websocket message to the server to return what layers we can use. Then you should get a list in the list box “Choose layer”.

  1. Ok, now choose “Areal Types” (or whatever you want, but just to follow my numbers).

Now you have 4 buttons to choose from. See the upper most row on the picture above. The two buttons to the left will both give you twkb geometries. The websocket button to the left sends twkb “as is” uncompressed, and the second button sends the twkb geometries compressed through php. To the right you get the corresponding buttons for geoJSON.

Behind the twkb-buttons is a query against the database that looks like this:

SELECT ST_AsTWKB(geom,5) FROM prep.n2000_arealtyper;

and the geoJSON query looks like this:

SELECT ST_AsgeoJSON(geom,5) FROM prep.n2000_arealtyper;

So, we are querying the same data in both cases and asks for 5 decimals in both cases. The SRID of the table is 4326, so it is lat lon coordinates. that is why 5 decimals makes sense.

Comparing the web socket and php implementations in timing makes no sense. There is too many unknown in that. But it is interesting to compare the sizes. To see how much it gets compressed.

The sizes of the uncompressed gets displayed in the table under the button. But I have found no way to get the compressed size in javascript. But in Coogle Chrome dev tools you can see it under the Network tab.

Then you see that the compressed version of areal types in geoJSON is 1.2 mb. If you test the uncompressed version of TWKB you will see that it is about 720 kb. geoJSON compresses from 3.6 mb so it gets compressed quite a lot to 1.2 mb. TWKB only compresses to 660 kb from 720 kb so geoJSON compresses much better. But anyway, uncompressed twkb is quite a lot smaller than compressed geoJSON.

The differences will vary depending on dataset and number of decimals, but it seems like TWKB gets smaller all over.

The two second arguments I had in the beginning, why twkb is a better choice is about the server load. I have no good way in investigating what takes time on the server, but how long we have to wait before we get any data from the server says something about the work the server needs to do. In the network tab in the chrome dev tools you can hold the mouse over the bar to the left showing the time spent on getting the data. Then you will get the timing divided in “waiting” and “receiving” like in the picture below.

The first thing you can note is that my connection is quite slow. Getting 1.2 mb takes about 7 seconds today. So, reducing size of content to send over internet is still important, even if you as a developer is sitting on a Gb line. What you also can see, and is more important is that you have to wait 1.37 seconds before you start to get anything back, no matter how fancy internet connection you have. If you do the same thing with twkb, by asking for compressed twkb data you will probably get a timing around 0.8 seconds. So there is a difference of at least 0.5 seconds.

in both cases all the data gets sent from the database to php before anything gets sent. So what we compare is how long it takes for twkb vs geoJSON to:

  1. Be read from disc and twkb vs geoJSON gets constructed in the database
  2. The data sent from the database to php
  3. Php builds the page
  4. It gets compressed

In all those steps 3.6 mb needs to be handled in the geoJSON case compared to 720 kb in the TWKB case. In the TWKB case the size gets already when the twkb geometry gets constructed.

Maybe 0.5 seconds doesn’t sounds that much. But this load is not only affecting me sitting waiting for a map showing up. It affects resources shared by anyone asking for a map from the same server.

You can play around with the different data sets and compare timing and sizes of twkb vs geojson and compressed vs uncompressed.

Some TWKB updates

Last week I gave my first talk about TWKB. One good thing about presenting what you are doing is that it makes you actually do it.

So, what is new?

  • TWKB is in PostGIS trunk! That means it will be included in PostGIS 2.2.
  • ID is optional in the spec. In the PostGIS implementation that means that if you don’t ad an ID there will be no space occupied for an ID. A minimal point is then only 4 bytes.
  • The one and only serialization method is VarInt, the same way as integers gets serialized in proto buffers.
  • I have added a quite generic javascript example of how to read twkb into a geoJSON object
  • The demo page http://178.79.156.122/twkb_test is refreshed and php-examples is added to see the effect of compressing geoJSON compared to twkb. More about that in a separate blog post

Call for brain power

I believe that TWKB can be something good. It can be a very fast format for moving geometries around.

If TWKB is going to be something more than a few demos like here and here, more brain power is needed.

My vision for TWKB is thatin 2013 it will:

  • be supported in Leaflet and OpenLayers 3
  • be in the trunk for relaese in PostGIS 2.2
  • be supported by OGR
  • have more than 5 contributors to the specification

What I have so far with TWKB is collected here. It is a github, divided in 3 parts.

First part is the specification.

Second part is the PostGIS implementation of the spec (Type 1 to 24 is implemenetd)

The last part is the web related scripts, like webserver and client-implementation

All this above is just meant as something to start the discussion from. The goal is to find a very efficient and flexible binary format for geometries.

For me myself I also hope that someone will hire or employ me so I can work with things like this on day time :-) I have a lot of ideas I would like to test.

TWKB aggregates

TWKB (Tiny WKB) can be aggregated and nested. The result is a special type. Since the TWKB-geometry holds it’s own ID (like many text-based gis-formats) , the result of an aggregation of many TWKB-geometries also nests the ID’s into the new aggregated TWKB geometry.

This gives us possibilities like creating a type of vector tiles on the fly. I have tried to demonstrate it here, but I didn’t get it as visual as I had hoped.

Those types is described as type 21-24 in the first draft of TWKB-specification.

A few questions and answers about the last posts

I have had a few questions about TWKB and the websocket DEMO

When does the geoemtries actually load?
It is not obvious when the geoemtries actually gets loaded from the server to the browser.

That happens first time you click a layer. Then the geometries are streamed row by row via websocket to the browser which parses the geometry and adds it to the map, also geometry by geometry. Then if you switch off a layer and back again it is just loaded from Leaflet internally.

Can TWKB handle more than 2 dimmensions
Yes, TWKB can handle up to 7 dimmensions.

Can this websocket approach be used for writing back to the database?
Yes that is easy to implement. Just send the geometry back to nodejs with ws.send(), and insert it to the database. There is no function to import from TWKB into PostGIS. That is no big thing to write, but I don’t think there is the same performance need when posting back to the database, since that will be one or two geometries, not thousands of them. So the easiest is to just send it back as WKT and use ST_Geomfromtext to get it in the database.

Mapservice from Websocket with TWKB

For those who don’t know what I am talking about TWKB is a compressed binary export format from PostGIS described here, and here.

It is just in the experimental stadium. The source for the PostGIS part can be found here

The Mapservice
What I find maybe most interesting is the websocket thing. I haven’t played with that before. Maybe this is old news for all of you out there. But websockets works cross-domain. So, a websocket can be approached from a page on my webserver or from a page on your desktop. That makes no difference.

You can download the index.htm from the demo:
The DEMO
put it someone on your computer, open it with a browser and it should load the maps.

You can also put this in a javascript:

var ws = new WebSocket('ws://178.79.156.122:8088');
ws.send(JSON.stringify({"nr":nr,"map_name":map_name}));

and you should get a reply if you use one of the map names that my “service” has.

If you don’t know the map names you can send:

ws.send(JSON.stringify({"request":"getcapabilities"}));

and you will get back a json-object with some metadata (That is demonstrated in the demo too)

All this is very unfinished, but it shows the idea.

Test it in OpenLayers too
The demo I have written is in Leaflet. That is just because it seemed easier to get started to test this in Leaflet. But it would be very interesting if someone took TWKB for a ride with OpenLayers (3). Since OpenLayers 3 promises webGL support a compressed binary format ought to be interesting.

What parameters the websocket takes
This is not even tested all of it, but here is what you can send to the wesocket:

map_name
srid (if not passed the srid of the table will be used, which can be found as default_srid in with getcapabilities)
precision how many decimals the coordinates shall have. Consider what unit your srid has. If not set the default value will be used
center.x & center.y Those coordinates gives the point that the result is ordered from. The idea is to get it order from the middle of the map, which makes sence if you are zoomed in and don’t see the whole map
inverted_lat_lng, boolean

As showed here it can be used directly from the websocket. Then there is not even any compiling involved.

I will, as I have said come up with a post about the technical aspects of the format, but I am afraid that will take some time. Meanwhile I will gladly answer any questions to make it easier getting started.

Nodejs, Websockets and TWKB

A short update about twkb, described in this earlier post

I have been doing some testing, sending the geometries from PostGIS to the client as twkb through a websocket.

This is the first time I have been playing with nodejs and websockets. It is really nice things.

Here is the demo:

http://178.79.156.122/twkb_node/

Wait till the page is properly loaded, and click “Municipalities” or “Areal types”. then you should see the geometries start showing. It should start showing in the middle of the map and going outwards. The neat thing about that is that when you are zoomed in at any place in the map and click “Areal types” for instance, you will almost at once get the geometries where you are zoomed in.

But if you click Municipalities before the “Areal types” are finished, you will have to wait a few seconds. That is because all geometries of the first layer is already queued at the client, and I haven’t found any way to manipulate that queue.

To get the stream ordered by distance to the center of the map is only possible because the geometries is taken directly from the database.
The query for the Municipalities layer looks something like this:

SELECT kommunenr, ST_Astwkb(geom,3,kommunenr,'NDR') geom
FROM kom_geom
ORDER BY ST_Setsrid(ST_Point($1,$2),4326) geom;

where $1 and $2 is lat long from the center of the map.

I think it works pretty fast. Municipalities layer has 435 geometries and 149363 vertex-points in total, and the Areal types layer has 3553 geometries and 179025 vertex-points.

Tiny WKB

Lately I have spent some time on a compressed binary output format from PostGIS. It is so far just some sort of “proof of concept”.

The idea is a binary format with some of the features found in common text-based gis formats. The main features is:

  1. Controllable precision (number of decimals)
  2. Relative coordinates
  3. The ID of the geoemtry is stored in the geoometry

I have called the format Tiny WKB since the wkb-format was the closest I know of. But probably the name should be something different since wkb means “well known binary” and this is not “well known”. But Tiny WKB or twkb will have to do for now. The function in PostGIS to create a twkb geometry is I have called ST_ASTWKB, ST_ASTWKB(geometry, precision, ID, endianess).
The sourse code for the TWKB creator is found in http://svn.osgeo.org/postgis/spike/nicklas/twkb.
Just get it with subversion and compile as usual with PostGIS.

So, let’s take a look at the advertised features:

Control over number of decimals

The geometries stored in the database have far more precision than needed for presentation purposes. Often when showing a map on the web, a precision more than one meter is overkill, even when zoomed in. So, if you are using a meter based projection and you want just full meter precision you set the precision parameter to 0. If you want 10 meters precision you set precision to -1.

Relative coordinates

For instance a line like
‘LINESTRING(352400 6752414, 352415 6752418, 352452 6752402)’
, with relative coordinates looks like this:
‘LINESTRING(352400 6752414, 15 4, 37 -16)’

The more coordinates in a geometry the more space we save by using relative coordinates. This is used in formats like SVG and TopoJSON. It gets more complicated when dealing with a binary format since there is no separator between the numbers. In wkb-format for instance that is no problem since all numbers uses the same number of bytes or bits. The reader just counts the bits and knows where the number stops. But then we would gain nothing from our relative coordinates. So twkb handles 3 different storage sizes of the coordinates 1, 2 or 4 bytes.

Storage of the geometry ID  inside the geometry

In the header of the geometry there is a 4 byte integer for storing an ID of the geometry. This gives some possibilities. For instance we can write an aggregating variant of ST_ASTWKB. Then we can aggregate the twkb geometries to geometry collections grouped by intersection with a grid. Then we get vector-tiles directly from the database with ID inside the tile to each single geometry. So at client side each geometry can be identified and joined to it’s attribute data.

The ID should also make it easier to implement support for typologies. All edges can be sent separately with included ID.

OK, so how small does it get

To give some numbers in bytes:

geometry WKB TWKB incl 4 bytes of ID
POINT(1 1) 21 14
LINESTRING(1 1, 10 15) 41 20
LINESTRING(1 1, 10 15, 22 30) 57 22

Ok, you get the point. Bigger geometries gains more from twkb than smaller. But the gain gets smaller if the need of precision is higher. If we take the last example and wants to store 3 decimals the difference is smaller. Also if the distance between the coordinates increases we need more space to store the relative coordinates. There is also an overhead in changes of sizes. So, to make it extreme:

geometry WKB TWKB incl 4 bytes of ID
LINESTRING(1 1, 1000000 15, 1010 30) 57 36

TWKB is still quite a lot smaller than WKB but the difference is smaller.

But how fast is it?

That is not easy to answer. It takes some overhead to create the TWKB since the geometry have to be analyzed in the database and each coordinate calculated, not just copied. But that overhead seems to disappear in the gain of decreased IO.

I have a layer of all the roads in Norway. Some stats of the layer:
Number of linestrings: 1224248
Total number of vertex points: 23485321

To just check the cost of creating WKB vs TWKB we can do like this (and get the total size as bonus):
SELECT SUM(LENGTH(ST_ASBinary(geom))) FROM veger;
that takes on this machine 979 ms

The corresponding query for TWKB looks like this:
SELECT SUM(LENGTH(ST_ASTWKB(geom,0,gid,’NDR’))) FROM veger;
and takes 2770 ms

So we have an almost 2 seconds overhead.

But if we instead of just checking the size of the result actually puts the result in a table:

CREATE TABLE wkb_veger as
SELECT ST_ASBinary(geom) geom FROM veger;

takes 6437 ms

and

CREATE TABLE wkb_veger as
SELECT ST_ASTWKB(geom,0,gid,’NDR’) geom FROM veger;

takes 3680 ms

So the smaller size even in internal handling in the database eats up the overhead.

To be fair we should mention that we reduce the number of decimals to 0. But actually the original layer had no more than 1 decimal precision even if there is a lot of trailing zeros. So if we create TWKB with 1 decimal instead it takes 4346 ms.

The geometries as WKB uses 368 mb

If stored as TWKB with 0 decimals it uses 63 mb, with 1 decimal 79 mb and with 2 decimals (1 trailing zero), 108 mb. Don’t forget that includes 4 bytes of ID to each geometry.

So, the database is quite fast in handling TWKB. But for web-mapping, how about php and javascript?

The DEMO

I think there is quite a lot of optimization to do in my demo. Maybe NodeJS is faster than php for getting the binary data out for example. Also the javascript TWKB reader I have written probably suffers from bad coding.

But it works, and my hope is that other people finds this interesting enough to build interesting clients. It would for instance be very interesting to see how QGIS would react on more slimmed geometries. I think it would give new possibilities to get faster rendering.

The demo can be found here:

http://179.78.156.122/twkb

It is a Leaflet map. To turn on the TWKB layers check the check boxes in the bottom of the page. I have tested in Chrome and Firefox. As you can see from the timer that appears after a TWKB layer is loaded there is several bottlenecks. I think php seems to work quite slow here and also the addition of the geometries in leaflet. The reading of the TWKB geometries (parsing) is just a very small part of the time it takes. The layers is stored in srid 4326. The first Municipalities layer has 3 decimals and Municipalities HD has 5 decimals. You can see the difference when zooming close. The “Areal Types layer” also has 5 decimals.

Summary

Now this is just a “prof of concept” in my sandbox in PostGIS resporitory.
If this sounds interesting give some feedback what is needed to make something good out of it. If you have the possibility it would be very valuable with a better and more sophisticated client. As mentioned QGIS rendering TWKB would be very interesting. If there is an interest I will write a new post describing the technical aspects of the format.

If the interest is big enough it might go into PostGIS some time :-)