Election data maps
Having built a representation of the county council election results from data scraped from the Warwickshire Web, I wanted to add maps to it. Especially as the Ordnance Survey have not long released their Boundary Line dataset which contains a whole bunch of boundaries including the county electoral divisions.
I wasn't quite sure what I wanted to end up with - as a start, an outlined map of the area in each division page would be a start.
Just to say I know very little about GIS or geodata generally - I've plotted some points on Google Maps previously, but that's about as far as I've gone before this project. So with that, if I've misrepresented anything or got anything badly wrong, do let me know.
Just to say that this work relies on the Boundary Line dataset provided as open data by Ordnance Survey. Here's the copyright notice, and there's more info on their licencing page.
Contains Ordnance Survey data © Crown copyright and database right 2010
The county electoral division dataset comes in a 46MB ESRI Shapefile, which is a pretty standard format for this kind of data. It contains all of the polygons for the county councils across England - as I'm only after the Warwickshire stuff, I'll need to chuck most of that away.
Here's a representation of the shapefile in Quantum GIS, which can read ESRI Shapefiles. Warwickshire is picked out in yellow.
I had a vague idea about getting the shapefile converted into KML so I could chop it down to size and manipulate it. Something I learned early was that the data in the shapefile is in eastings and northings - KML needs latitude and longitude.
To do this conversion I used the command line utility ogr2ogr, which is part of the open-source GDAL (Geospatial Data Abstraction Library). ogr2ogr can apparently "...can bring joy and a more peaceful nature into your life." Now, I don't want joy or peace, I just want eastings/northings converted into lat/long - here goes:
ogr2ogr -s_srs EPSG:27700 -t_srs EPSG:4326 destination.shp source.shp
Which is great and all, but it's still a shapefile. Luckily there's also a command for turning .shp files into .kml.
ogr2ogr -f "KML" destination.kml source.shp
So from a 46MB shapefile, we end up with a 108MB KML file.
The KML file
The KML file is just a type of XML file, for use in Google Earth. Each polygon looks rather like this, only with lots (and lots) of co-ordinates - over 3000 in the case of one of the divisions:
<Placemark> <Style> <LineStyle> <color>ff0000ff</color> </LineStyle> <PolyStyle> <fill>0</fill> </PolyStyle> </Style> <ExtendedData> <SchemaData schemaUrl="#electoral_division_latlng"> <SimpleData name="FID">1558</SimpleData> </SchemaData> </ExtendedData> <Polygon> <outerBoundaryIs> <LinearRing> <coordinates>-1.667473197956685,52.164132540593961 (and lots more....)</coordinates> </LinearRing> </outerBoundaryIs> </Polygon> </Placemark>
The co-ordinate pairs also seem to be in long,lat format, rather than lat,long - and each pair is separated by a space. Also as part of this data we have styling info, which I'm sure was done for good engineering reasons but I'd wonder if that's best separated out. Anyway.
As you can see from the QGIS visualisation above, most of the data isn't about Warwickshire, so we can chuck that away. Alongside the original county_electoral_division_region.shp file is a .dbf (dBase, right?) file, which can be viewed in context within QGIS:
At first glance the KML file seemed to lose the link to the .dbf file, meaning that I wasn't sure which shape related to which electoral division. The .dbf file can be opened in Excel, and eventually I worked out that the FID value in the KML was the same as the row number (starting from zero) of the electoral division in the .dbf file. Row numbers start from 1 in Excel, so to save my addled brain I added a new column in the Excel file numbering the rows from zero. After a while I'd managed to chop the original 108 MB KML file down to 3 MB. TextWrangler (and possibly big brother bbedit) seems to be the only Mac OS X text editor that can handle that size of file without dying, by the way - beloved TextMate just seems to hang.
Putting the data into a database
So once I had the KML file, what to do next. I really wanted to link the electoral division geography with my Warwickshire election data site. Ideally I'd have a PostGIS-enabled version of PostgreSQL to shove all the data into, but I haven't. Some desperate scanning round the web produced the revelation that MySQL can store polygons as binary data, it's just that it can't do very much with them.
For now I decided to put it all into the MySQL database, using the simplexml library within PHP. This was also a chance to tweak the precision of the points - the dataset has the points stored down to 16 or 17 decimal places, but our intended destination Google Maps can only deal with about 6 decimal places, so I rounded off the point co-ordinates as I loaded them into the database.
From reading around later on, it seems I've gone the long way around this - it seems that ogr2ogr should be able to load data from a Shapefile directly into a MySQL database. If nothing else, I've got myself a KML import script which could be handy.
Simplifying the data for display on the web
Counting up all the points in the Warwickshire subset of the datasets gives us (deep breath) 81624 points. Which is quite a lot. Given the slow performance of the rendering of the dataset within Quantum GIS, I had decided to try and simplify the polygons for display on the web.
The pink colour is a pretty poor choice to demonstrate this, but hopefully it'll make sense - here's a simplified version of the Aston Cantlow electoral division - the pink line has 813 points, the yellow line has 74 points.
I also wrote a routine to export the whole dataset as a simplified KML file - this is at setting 3000, which results in a 176 KB KML file, and 6647 points (just over 8% the size of the full version):
Here's a link to the KML in Google Maps, with 6647 points - notice on the left that the areas aren't labelled, yet.
After all that, I did an export of the dataset at 99% of the points in the full version, resulting in a 1.8 MB KML - this is what I get, which still works quickly in Mac OS X Safari, and was actually OK in IE7 too.
I was interested that IE could cope with this many points - it made me wonder if something was being done Google server side to smooth the points out at a particular zoom level.
I also wrote a variation on this to output back to the database rather than out as a KML file - this meant I could quickly experiment with different generalisations of the data to check performance.
Linking the geographic areas to the election data
The info as to which shape is which area is stored in the associated .dbf file - I'd converted it earlier to a CSV file, so I wrote a routine to compare the titles of the electoral divisions in the database against the titles of the areas in the CSV; not perfect, but good enough.
Once I knew which shape was which, I could then associate the election data against it, and build a map.
Final simplified version
Created a row in the areas table for simplified versions of the polygon - went for setting 4000 on my simplifier program which meant 8186 points, 10% of the size of the original.
Somehow I must've had too much time to myself - here's a comparison of the speed differences between the original and simplified map in Mac OS X Safari 4.0.5.