Geospatial Databases

Postgres & PostGIS

Postgres is an open source database that initially came from Stonebraker's group during his time at Berkeley. It has developed into one of the most advanced open source relational database systems and PostGIS is a powerful extension to the Postgres database that enables geospatial columns, indexing and functions.

The first order of things after installation is to create a database with the PostGIS extension. After that, a column with a geometry data type can be created. Geometry is a general data type for spatial data that can store either points or polygons.

Lets say we have the 'latitude' and 'longitude' columns imported from a CSV file into a Postgres table called 'burningarea' and we have run an ALTER on the table to create a geom type column called 'the_geom', we just have to run an update on the table to set 'the_geom' column to create a geom from the 'latitude' and 'longitude' columns using EPSG 4326 projection standard.

UPDATE burningarea
SET the_geom = ST_GeomFromText('POINT(' || longitude || ' ' || latitude || ')',4326);

PostGIS is powerful because it supports spatial functions. Functions with an ST (spatial type) prefix are spatial functions in PostGIS. These functions are divided into:

  • Geometry Constructors
  • Geometry Accessors
  • Geometry Editors
  • Geometry Outputs
  • Operators
  • Spatial Relationships and Measurements
  • SFCGAL Functions
  • Geometry Processing
  • Linear Referencing
  • Temporal Support

A full reference can be obtained from the postgis documentation. However, I should point out that there are many useful functions, like the:

  • ST_Distance — For geometry type Returns the 2D Cartesian distance between two geometries in projected units (based on spatial ref). For geography type defaults to return minimum geodesic distance between two geographies in meters.
  • ST_Covers — Returns 1 (TRUE) if no point in Geometry B is outside Geometry A
  • ST_Intersects — Returns TRUE if the Geometries/Geography "spatially intersect in 2D" - (share any portion of space) and FALSE if they don't (they are Disjoint). For geography -- tolerance is 0.00001 meters (so any points that close are considered to intersect)
  • ST_ConvexHull — The convex hull of a geometry represents the minimum convex geometry that encloses all geometries within the set.
  • ST_DWithin — Returns true if the geometries are within the specified distance of one another. For geometry units are in those of spatial reference and For geography units are in meters and measurement is defaulted to use_spheroid=true (measure around spheroid), for faster check, use_spheroid=false to measure along sphere.

Spatial indices are one of the greatest assets of PostGIS. The ST_DWithin function can determine the relationship between two geometries much faster by determining the relationship between the bounding boxes of the geometries. Indices take this concept one step further by indexing the bounding box of the geometry rather than the geometry itself.

Spatial indices are simple to create in PostGIS. The following commands would create an index on the geometry column of the haze table.

CREATE INDEX haze_geom ON haze USING GIST (the_geom);

A comprehensive tutorial to PostGIS is available from OSGeo.

MySQL

MySQL has recently implemented some support for geospatial functions in its latest versions. The support is less mature than PostGIS, however, is useful if your data is already in a MySQL database. Here is the reference for the spatial extension.

Columnar Databases

Columnar databases are relational databases that store data in column format rather than rows. There are a few advantages to this format. First of all because the data stored along a column is usually of similar type, it can often be compressed much more efficiently (think of a column storing the type of goods, there will be many repeats of a type like 'grocery' or 'office equipment' which can then be compressed much more). While retrieving columns from a database, if the query requires just 3 columns from a row with 10 columns, the entire row need not be retrieved. Together with saving storage space for headers and other benefits, columnar databases have been very popular with data warehousing systems. The disadvantage though is that although read performance is excellent, write performance requires some clever engineering to perform well. Generally the approach is to insert in-memory first and flush to disk the sorted columns (sorting enabled very good compression ratios).

Columnar compression and faster access seems like an advantage for sensor data. One such advanced columnar store is Vertica, which has good performance in data warehousing and uses projections instead of indexes for speed. Vertica supports some spatial functions, although not as complete as those offered by PostGIS.

Others

Some other databases support spatial extensions and while there are no leading out-of-the-box spatial databases, NoSQL alternatives like MongoDB have shown decent spatial support. MongoDB supports spatial indexes and GeoJSON with a few functions like geoWithin. Cassandra (somewhat columnar) is a highly scalable, eventually consistent database. A startup tried that tried to use Cassandra to build a geospatial solution was SimpleGeo.