Bitten by an exotic bug

I have been tearing my hair for a couple of evenings not understanding anything about what is happening in my code. It has been very frustrating, but now I think I just understood the whole problem.

It might be obvious, but to me it is very exotic and I am still not sure that I have solved the problem.

So, my problem was this:

I am rewriting the memory handling in the twkb function in PostGIS. Why I am doing that is another story that I will tell when things are working as expected.

I have a structure that looks like this:

typedef struct
uint8_t *buf_start;
uint8_t *buf_end;
uint8_t **buf;

Then I wrote a small helper function to do the allocations in this structure.

It looks something like this a little simplified:

int alloc_buffers(BUFFER_STORAGE *buffers, size_t size)
uint8_t *bulk;
bulk=(uint8_t*) lwalloc(size); /*a PostGIS-specific allocator function*/

Ok, do you see the problem?
Well, it bites me in a very subtle way. Nothing happens until, when I am back to the calling function, initiates a new variable. Then things go totally wrong. In my struct the buf-pointer goes bananas.

What I just realized is why.

This is what happens (I think):
In my helper function, when I initialize the bulk-variable it is created on the stack. I mean the pointer variable. The bulk itself is allocated on the heap. Then, when I put the address to the local pointer variable in my structure as **buf, I store a local short lived address, in my long living structure. Then, when I get back to the calling function that address is not valid any more, and is reused when I initialize a new variable in the stack.

So, what I saw was that my new variable in the calling function had the same address as I had stored in my structure. When I used my buf-address it just pointed to a pointer to my new variable instead of the bulk I was expecting.

I cannot see any warning telling me that I am doing something stupid.

Does this makes sense, or will I find tomorrow evening that I still haven’t found my problem?


Open question to Mr Gates, Bill Gates

I saw, some months ago in news and talk shows that you were visiting the nordic countries, talking about your charity work.
I saw some of the interviews and I was left with some questions.

Let’s say, in some situation, you sit down with someone in a local administration for example in a country in southern Africa. This person asks you an honest question:

<em>”Mr Gates, with your background I guess you are the right person to ask; What IT-platform do you recommend our administration (or country) to invest in?
Should we go the Microsoft path or should we invest in an open source platform. Should we invest our money in the right to use Microsoft software or should we invest them in learning and developing systems built on open source solutions?”</em>

If, Mr Gates, you come to the same conclusion as I do, that they gain more in taking control over their IT-systems, investing in understanding and developing them, then I have a second question: Have you re-evaluated your view on software licensing when you now have had the opportunity to travel around to see the world from so many perspectives?

If your answer to the initial question is that they should invest in the Microsoft software suite using Windows, Office, Sql Server, Outlook, Explorer, sharepoint, msn etc etc then I am left with the feeling that your work of today looks more like a promotion gimmick . It would also give me a whole new perspective of cynism.

I know that Microsoft can make great deals to countries that cannot afford full price licenses. But that is not the point. That’s not the point at all actually. I made some searches to defend my point and I found it all written and explained in a very brilliant way from 2007 here:

If you, Mr Gates have re-evaluated your view of software licensing and sees that free and open source software is a very important part of the foundation of a sustainable and fair world, then I will support your work fully. No one can of course oppose the intentions of your efforts to fight polio, colera etc, etc.


Best Regards

Nicklas Avén

Comparing TWKB with compressed geoJSON

One question about TWKB is if there is any gain compared to just compressing geoJSON. That question is worth some investigation. In php for instance you can compress all data on the fly before sending it to the client. Then the browser will decompress the message and use it as normal. This works really good and fast. So, the question is if it is worth the effort to build a twkb-geometry.

I have three reasons why it is.

  1. twkb seems to actually be smaller than compressed geoJSON.
  2. Compressing takes cpu time and power
  3. If you take the geometry directly from the database, there will be a lot more data to send from the database to the web server.

To demonstrate this I have published a web page
It is a little bit messy so I will guide you.

I find the dev tools in Google Chrome nice to compare the sizes and timing of the compressed data.

First press the button “Get available Layers”. That will send a websocket message to the server to return what layers we can use. Then you should get a list in the list box “Choose layer”.

  1. Ok, now choose “Areal Types” (or whatever you want, but just to follow my numbers).

Now you have 4 buttons to choose from. See the upper most row on the picture above. The two buttons to the left will both give you twkb geometries. The websocket button to the left sends twkb “as is” uncompressed, and the second button sends the twkb geometries compressed through php. To the right you get the corresponding buttons for geoJSON.

Behind the twkb-buttons is a query against the database that looks like this:

SELECT ST_AsTWKB(geom,5) FROM prep.n2000_arealtyper;

and the geoJSON query looks like this:

SELECT ST_AsgeoJSON(geom,5) FROM prep.n2000_arealtyper;

So, we are querying the same data in both cases and asks for 5 decimals in both cases. The SRID of the table is 4326, so it is lat lon coordinates. that is why 5 decimals makes sense.

Comparing the web socket and php implementations in timing makes no sense. There is too many unknown in that. But it is interesting to compare the sizes. To see how much it gets compressed.

The sizes of the uncompressed gets displayed in the table under the button. But I have found no way to get the compressed size in javascript. But in Coogle Chrome dev tools you can see it under the Network tab.

Then you see that the compressed version of areal types in geoJSON is 1.2 mb. If you test the uncompressed version of TWKB you will see that it is about 720 kb. geoJSON compresses from 3.6 mb so it gets compressed quite a lot to 1.2 mb. TWKB only compresses to 660 kb from 720 kb so geoJSON compresses much better. But anyway, uncompressed twkb is quite a lot smaller than compressed geoJSON.

The differences will vary depending on dataset and number of decimals, but it seems like TWKB gets smaller all over.

The two second arguments I had in the beginning, why twkb is a better choice is about the server load. I have no good way in investigating what takes time on the server, but how long we have to wait before we get any data from the server says something about the work the server needs to do. In the network tab in the chrome dev tools you can hold the mouse over the bar to the left showing the time spent on getting the data. Then you will get the timing divided in “waiting” and “receiving” like in the picture below.

The first thing you can note is that my connection is quite slow. Getting 1.2 mb takes about 7 seconds today. So, reducing size of content to send over internet is still important, even if you as a developer is sitting on a Gb line. What you also can see, and is more important is that you have to wait 1.37 seconds before you start to get anything back, no matter how fancy internet connection you have. If you do the same thing with twkb, by asking for compressed twkb data you will probably get a timing around 0.8 seconds. So there is a difference of at least 0.5 seconds.

in both cases all the data gets sent from the database to php before anything gets sent. So what we compare is how long it takes for twkb vs geoJSON to:

  1. Be read from disc and twkb vs geoJSON gets constructed in the database
  2. The data sent from the database to php
  3. Php builds the page
  4. It gets compressed

In all those steps 3.6 mb needs to be handled in the geoJSON case compared to 720 kb in the TWKB case. In the TWKB case the size gets already when the twkb geometry gets constructed.

Maybe 0.5 seconds doesn’t sounds that much. But this load is not only affecting me sitting waiting for a map showing up. It affects resources shared by anyone asking for a map from the same server.

You can play around with the different data sets and compare timing and sizes of twkb vs geojson and compressed vs uncompressed.

Some TWKB updates

Last week I gave my first talk about TWKB. One good thing about presenting what you are doing is that it makes you actually do it.

So, what is new?

  • TWKB is in PostGIS trunk! That means it will be included in PostGIS 2.2.
  • ID is optional in the spec. In the PostGIS implementation that means that if you don’t ad an ID there will be no space occupied for an ID. A minimal point is then only 4 bytes.
  • The one and only serialization method is VarInt, the same way as integers gets serialized in proto buffers.
  • I have added a quite generic javascript example of how to read twkb into a geoJSON object
  • The demo page is refreshed and php-examples is added to see the effect of compressing geoJSON compared to twkb. More about that in a separate blog post

Call for brain power

I believe that TWKB can be something good. It can be a very fast format for moving geometries around.

If TWKB is going to be something more than a few demos like here and here, more brain power is needed.

My vision for TWKB is thatin 2013 it will:

  • be supported in Leaflet and OpenLayers 3
  • be in the trunk for relaese in PostGIS 2.2
  • be supported by OGR
  • have more than 5 contributors to the specification

What I have so far with TWKB is collected here. It is a github, divided in 3 parts.

First part is the specification.

Second part is the PostGIS implementation of the spec (Type 1 to 24 is implemenetd)

The last part is the web related scripts, like webserver and client-implementation

All this above is just meant as something to start the discussion from. The goal is to find a very efficient and flexible binary format for geometries.

For me myself I also hope that someone will hire or employ me so I can work with things like this on day time :-) I have a lot of ideas I would like to test.

TWKB aggregates

TWKB (Tiny WKB) can be aggregated and nested. The result is a special type. Since the TWKB-geometry holds it’s own ID (like many text-based gis-formats) , the result of an aggregation of many TWKB-geometries also nests the ID’s into the new aggregated TWKB geometry.

This gives us possibilities like creating a type of vector tiles on the fly. I have tried to demonstrate it here, but I didn’t get it as visual as I had hoped.

Those types is described as type 21-24 in the first draft of TWKB-specification.

A few questions and answers about the last posts

I have had a few questions about TWKB and the websocket DEMO

When does the geoemtries actually load?
It is not obvious when the geoemtries actually gets loaded from the server to the browser.

That happens first time you click a layer. Then the geometries are streamed row by row via websocket to the browser which parses the geometry and adds it to the map, also geometry by geometry. Then if you switch off a layer and back again it is just loaded from Leaflet internally.

Can TWKB handle more than 2 dimmensions
Yes, TWKB can handle up to 7 dimmensions.

Can this websocket approach be used for writing back to the database?
Yes that is easy to implement. Just send the geometry back to nodejs with ws.send(), and insert it to the database. There is no function to import from TWKB into PostGIS. That is no big thing to write, but I don’t think there is the same performance need when posting back to the database, since that will be one or two geometries, not thousands of them. So the easiest is to just send it back as WKT and use ST_Geomfromtext to get it in the database.

Mapservice from Websocket with TWKB

For those who don’t know what I am talking about TWKB is a compressed binary export format from PostGIS described here, and here.

It is just in the experimental stadium. The source for the PostGIS part can be found here

The Mapservice
What I find maybe most interesting is the websocket thing. I haven’t played with that before. Maybe this is old news for all of you out there. But websockets works cross-domain. So, a websocket can be approached from a page on my webserver or from a page on your desktop. That makes no difference.

You can download the index.htm from the demo:
put it someone on your computer, open it with a browser and it should load the maps.

You can also put this in a javascript:

var ws = new WebSocket('ws://');

and you should get a reply if you use one of the map names that my “service” has.

If you don’t know the map names you can send:


and you will get back a json-object with some metadata (That is demonstrated in the demo too)

All this is very unfinished, but it shows the idea.

Test it in OpenLayers too
The demo I have written is in Leaflet. That is just because it seemed easier to get started to test this in Leaflet. But it would be very interesting if someone took TWKB for a ride with OpenLayers (3). Since OpenLayers 3 promises webGL support a compressed binary format ought to be interesting.

What parameters the websocket takes
This is not even tested all of it, but here is what you can send to the wesocket:

srid (if not passed the srid of the table will be used, which can be found as default_srid in with getcapabilities)
precision how many decimals the coordinates shall have. Consider what unit your srid has. If not set the default value will be used
center.x & center.y Those coordinates gives the point that the result is ordered from. The idea is to get it order from the middle of the map, which makes sence if you are zoomed in and don’t see the whole map
inverted_lat_lng, boolean

As showed here it can be used directly from the websocket. Then there is not even any compiling involved.

I will, as I have said come up with a post about the technical aspects of the format, but I am afraid that will take some time. Meanwhile I will gladly answer any questions to make it easier getting started.

Nodejs, Websockets and TWKB

A short update about twkb, described in this earlier post

I have been doing some testing, sending the geometries from PostGIS to the client as twkb through a websocket.

This is the first time I have been playing with nodejs and websockets. It is really nice things.

Here is the demo:

Wait till the page is properly loaded, and click “Municipalities” or “Areal types”. then you should see the geometries start showing. It should start showing in the middle of the map and going outwards. The neat thing about that is that when you are zoomed in at any place in the map and click “Areal types” for instance, you will almost at once get the geometries where you are zoomed in.

But if you click Municipalities before the “Areal types” are finished, you will have to wait a few seconds. That is because all geometries of the first layer is already queued at the client, and I haven’t found any way to manipulate that queue.

To get the stream ordered by distance to the center of the map is only possible because the geometries is taken directly from the database.
The query for the Municipalities layer looks something like this:

SELECT kommunenr, ST_Astwkb(geom,3,kommunenr,'NDR') geom
FROM kom_geom
ORDER BY ST_Setsrid(ST_Point($1,$2),4326) geom;

where $1 and $2 is lat long from the center of the map.

I think it works pretty fast. Municipalities layer has 435 geometries and 149363 vertex-points in total, and the Areal types layer has 3553 geometries and 179025 vertex-points.