Creating Geo-Enabled applications with MongoDB, GeoJSON and MySQL

This blog post will discuss creating geo-enabled applications with MongoDB, GeoJSON and MySQL.

Introduction

Recently I published a blog post about the new GIS features in MySQL 5.7. Today I’ve looked into how to use MongoDB (I’ve tested with 3.0 and 3.2, with 3.2 being much faster) for the same purpose. I will also talk about GIS in MySQL and MongoDB at Percona Live next week (together with my colleague Michael Benshoof).

MongoDB and GIS

MongoDB has a very useful feature called “geoNear.” There are other MongoDB spatial functions available to calculate the distance on a sphere (like the Earth), i.e.

$nearSphere

 ,

$centerSphere

,

$near

 – but all of them have restrictions. The most important one is that they do not support sharding. The 

geoNear

 command in MongodDB, on the other hand, supports sharding. I will use

geoNear

 in this post.

For this test, I exported Open Street Map data from MySQL to MongoDB (see the “Creating GEO-enabled applications with MySQL 5.6” post for more details on how to load this data to MySQL).

  1. Export the data to JSON. In MySQL 5.7, we can use JSON_OBJECT to generate the JSON file:
        mysql> SELECT JSON_OBJECT('name', replace(name, '"', ''),
           'other_tags', replace(other_tags, '"', ''), 'geometry', st_asgeojson(shape)) as j
           FROM `points`
           INTO OUTFILE '/var/lib/mysql-files/points.json';
    Query OK, 13660667 rows affected (4 min 1.35 sec)
  2. Use
    mongoimport
      to import JSON into MongoDB (I’m using 24 threads, -j 24, to use parallel import):
        mongoimport --db osm --collection points -j 24 --file /var/lib/mysql-files/points.json
    2016-04-11T22:38:10.029+0000    connected to: localhost
    2016-04-11T22:38:13.026+0000    [........................] osm.points  31.8 MB/2.2 GB (1.4%)
    2016-04-11T22:38:16.026+0000    [........................] osm.points  31.8 MB/2.2 GB (1.4%)
    2016-04-11T22:38:19.026+0000    [........................] osm.points  31.8 MB/2.2 GB (1.4%)
    …
    2016-04-11T23:12:13.447+0000    [########################] osm.points  2.2 GB/2.2 GB (100.0%)
    2016-04-11T23:12:15.614+0000    imported 13660667 documents
  3. Create a 2d index:
    mongo
    > use osm
    switched to db osm
    > db.points.createIndex({ geometry : "2dsphere" } )
    {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 1,
            "numIndexesAfter" : 2,
            "ok" : 1
    }

Another option would be using the osm2mongo Ruby script, which will convert the osm file and load it directly to MongoDB.

Now I can use the

geoNear

 command to find all the restaurants near my location:

> db.runCommand( { geoNear: "points",
                   near: { type: "Point" ,  coordinates: [ -78.9064543, 35.9975194 ]},
                   spherical: true,
... query: { name: { $exists: true, $ne:null},
                   "other_tags": { $in: [ /.*amenity=>restaurant.*/, /.*amenity=>cafe.*/ ] } },
                   "limit": 5, "maxDistance": 10000 } )
{
        "results" : [
                {
                        "dis" : 127.30183814835166,
                        "obj" : {
                                "_id" : ObjectId("570329164f45f7f0d66f8f13"),
                                "name" : "Pop's",
                                "geometry" : {
                                        "type" : "Point",
                                        "coordinates" : [
                                                -78.9071795,
                                                35.998501
                                        ]
                                },
                                "other_tags" : "addr:city=>Durham,addr:country=>US,addr:housenumber=>605,addr:street=>West Main Street,amenity=>restaurant,building=>yes"
                        }
                },
                {
                        "dis" : 240.82201047521244,
                        "obj" : {
                                "_id" : ObjectId("570329df4f45f7f0d68c16cb"),
                                "name" : "toast",
                                "geometry" : {
                                        "type" : "Point",
                                        "coordinates" : [
                                                -78.9039761,
                                                35.9967069
                                        ]
                                },
                                "other_tags" : "addr:full=>345 West Main Street, Durham, NC 27701, US,amenity=>restaurant,website=>http://toast-fivepoints.com/"
                        }
                },
...
}

MongoDB 3.0 vs 3.2 with geoNear

MongoDB 3.2 features Geospatial Optimization:

MongoDB 3.2 introduces version 3 of 2dsphere indexes, which index GeoJSON geometries at a finer gradation. The new version improves performance of 2dsphere index queries over smaller regions. In addition, for both 2d indexes and 2dsphere indexes, the performance of geoNear queries has been improved for dense datasets.

I’ve tested the performance of the above

geoNear

 query with MongoDB 3.0 and MongoDB 3.2 (both the old and new versions of 2dsphere index). All the results statistics are for a

"limit": 5

 and

"maxDistance": 10000

.

MongoDB 3.0, index version 2:

> db.points.getIndexes()
...
        {
                "v" : 1,
                "key" : {
                        "geometry" : "2dsphere"
                },
                "name" : "geometry_2dsphere",
                "ns" : "osm.points",
                "2dsphereIndexVersion" : 2
        }
]
"stats" : {
        "nscanned" : 1728,
        "objectsLoaded" : 1139,
        "avgDistance" : 235.76379903759667,
        "maxDistance" : 280.2681226202938,
        "time" : 12
},

MongoDB 3.2, index version 2:

> db.points.getIndexes()
[
...
        {
                "v" : 1,
                "key" : {
                        "geometry" : "2dsphere"
                },
                "name" : "geometry_2dsphere",
                "ns" : "osm.points",
                "2dsphereIndexVersion" : 2
        }
]
...
"stats" : {
          "nscanned" : 513,
          "objectsLoaded" : 535,
          "avgDistance" : 235.76379903759667,
          "maxDistance" : 280.2681226202938,
          "time" : 5
},

What is interesting here is that even with the

"2dsphereIndexVersion" : 2

, MongoDB 3.2 performs much faster and scans a much smaller number of documents.

MongoDB 3.2, index version 3:

> db.points.dropIndex("geometry_2dsphere")
{ "nIndexesWas" : 2, "ok" : 1 }
> db.points.createIndex({ geometry : "2dsphere" } )
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
> db.points.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "osm.points"
        },
        {
                "v" : 1,
                "key" : {
                        "geometry" : "2dsphere"
                },
                "name" : "geometry_2dsphere",
                "ns" : "osm.points",
                "2dsphereIndexVersion" : 3
        }
]
"stats" : {
          "nscanned" : 144,
          "objectsLoaded" : 101,
          "avgDistance" : 235.76379903759667,
          "maxDistance" : 280.2681226202938,
          "time" : 1
        },

That is significantly faster, 1ms for five results!

MySQL and GeoJSON revisited

To compare to the performance of the above query, I’ve created a similar query in MySQL. First of all, we will need to use the good old bounding rectangle (envelope) trick to only include the points in the 10 miles radius (or so). If we don’t, MySQL will not be able to use spatial (RTREE) index. I’ve created the following function to generate the envelope:

DELIMITER //
CREATE DEFINER = current_user() FUNCTION create_envelope(lat decimal(20, 14), lon decimal(20, 14), dist int) RETURNS geometry DETERMINISTIC
begin
declare point_text varchar(255);
declare l varchar(255);
declare p geometry;
declare env geometry;
declare rlon1 double;
declare rlon2 double;
declare rlat1 double;
declare rlat2 double;
set point_text = concat('POINT(', lon, ' ', lat, ')');
set p = ST_GeomFromText(point_text, 1);
set rlon1 = lon-dist/abs(cos(radians(lat))*69);
set rlon2 = lon+dist/abs(cos(radians(lat))*69);
set rlat1 = lat-(dist/69);
set rlat2 = lat+(dist/69);
set l = concat('LineString(', rlon1, ' ', rlat1, ',', rlon2 , ' ', rlat2, ')');
set env= ST_Envelope(ST_GeomFromText(l, 1));
return env;
end //
DELIMITER ;
mysql> set @lat= 35.9974043;
Query OK, 0 rows affected (0.00 sec)
mysql> set @lon = -78.9045615;
Query OK, 0 rows affected (0.00 sec)
mysql> select st_astext(create_envelope(@lat, @lon, 10));
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| st_astext(create_envelope(@lat, @lon, 10)) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| POLYGON((-79.08369589058249 35.852476764,-78.72542710941751 35.852476764,-78.72542710941751 36.142331836,-79.08369589058249 36.142331836,-79.08369589058249 35.852476764)) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Then we can use the following query (an update of the GeoJSON query from my previous post):

set @lat= 35.9974043;
set @lon = -78.9045615;
set @p = ST_GeomFromText(concat('POINT(', @lon, ' ', @lat, ')'), 1);
set group_concat_max_len = 100000000;
SELECT CONCAT('{
  "type": "FeatureCollection",
  "features": [
  ',
   GROUP_CONCAT('{
   "type": "Feature",
      "geometry": ', ST_AsGeoJSON(shape), ',
      "properties": {"distance":', st_distance_sphere(shape, @p) , ', "name":"', name , '"}
   }' order by st_distance_sphere(shape, @p)),
  ']
}') as j
FROM points_new
WHERE st_within(shape, create_envelope(@lat, @lon, 10))
          and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
          and name is not null
          and st_distance_sphere(shape, @p) < 1000;
...
1 row in set (0.04 sec)

The time is slower: 40ms in MySQL compared to 1ms – 12ms in MongoDB. The box is AWS EC2 t2.medium.

To recap the difference between MongoDB

geoNear

 and MySQL

st_distance_sphere

:

  • MongoDB
    geoNear
     uses 2dsphere index, so it is fast; however, it can’t just calculate the distance between two arbitrary points
  • MySQL st_distance_sphere is a helper function and will only calculate the distance between two points; it will not use an index – we will have to use the
    create_envelope
     function to restrict the search so MySQL will use an index

Time-wise, this is not an apples to apples comparison as the query is quite different and uses a different technique.

Visualizing the results

Results for GeoJSON for Google Maps API:

{
 "type": "FeatureCollection",
 "features": [
 {
  "type": "Feature",
     "geometry": {"type": "Point", "coordinates": [-78.9036457, 35.997125]},
     "properties": {"distance":87.67869122893659, "name":"Pizzeria Toro"}
  },{
  "type": "Feature",
     "geometry": {"type": "Point", "coordinates": [-78.9039761, 35.9967069]},
     "properties": {"distance":93.80064086575564, "name":"toast"}
  },{
  "type": "Feature",
     "geometry": {"type": "Point", "coordinates": [-78.9031929, 35.9962871]},
     "properties": {"distance":174.8300018385443, "name":"Dame's Chicken and Waffles"}
  },
...
}

Now we can add those on a map:

Back to MongoDB: pluses and minuses

MongoDB uses Google’s S2 library to perform GIS calculations. The

geoNear

 command is fast and easy to use for finding points of interests near you (which is the most common operation). However, full GIS support does not natively exist.

Another issue I came across when creating a 2dsphere index: MongoDB is very strict when checking the lines and polygons. For example:

> db.lines.createIndex({ geometry : "2dsphere" } )
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "errmsg" : "exception: Can't extract geo keys: { _id: ObjectId('570308864f45f7f0d6dfbed2'), name: "75 North", geometry: { type: "LineString", coordinates: [ [ -85.808852, 41.245582 ], [ -85.808852, 41.245582 ] ] }, other_tags: "tiger:cfcc=>A41,tiger:county=>Kosciusko, IN,tiger:name_base=>75,tiger:name_direction_suffix=>N,tiger:reviewed=>no" }  GeoJSON LineString must have at least 2 vertices: [ [ -85.808852, 41.245582 ], [ -85.808852, 41.245582 ] ]",
        "code" : 16755,
        "ok" : 0
}

MongoDB complains about this: type: “LineString”, coordinates: [ [ -85.808852, 41.245582 ], [ -85.808852, 41.245582 ] ]

This is a “bad” line string as the starting point and ending point are the same. I had to remove the bad data from my MongoDB imported dataset, which was tricky by itself. MongoDB (as opposed to MySQL) does not have a native way to compare the values inside the JSON, so I had to use $where construct – which is slow and acquires a global lock:

> db.lines.remove({"geometry.type": "LineString", "geometry.coordinates": {$size:2},
$where: "this.geometry.coordinates[0][0] ==  this.geometry.coordinates[1][0]
      && this.geometry.coordinates[0][1] == this.geometry.coordinates[1][1]" })
WriteResult({ "nRemoved" : 22 })

After that, I was able to add the 2dsphere index.

Conclusion

MongoDB looks good, is pretty fast and easy for geo-proximity search queries – until you go outside of the one function and need full GIS support (which does not natively exist). It may be trickier to implement other GIS functions like

st_contains or st_within

.

Update: as pointed out, MongoDB actually supports $geoWithin and $geoIntersects GIS functions.

Update 2: I was asked about MySQL and GeoJSON: why not to use the ST_MakeEnvelope function. One of the issues with

ST_MakeEnvelope

 is that it only works with SRID 0 (it requires point geometry arguments with an SRID of 0) and OSM data is stored with SRID 1. But also I will need to “add” 10 miles to my point. The only way to do that is to calculate the new point, 10 miles apart from “my” point/location. I have to use a custom function to manipulate the lat/lon pair.

The explain plan for the MySQL GeoJSON query shows that MySQL uses SHAPE (Spatial) index:

mysql> explain SELECT CONCAT('{
 "type": "FeatureCollection",
 "features": [
 ',
 GROUP_CONCAT('{
 "type": "Feature",
 "geometry": ', ST_AsGeoJSON(shape), ',
 "properties": {"distance":', st_distance_sphere(shape, @p) , ', "name":"', name , '"}
 }' order by st_distance_sphere(shape, @p)),
 ']
}') as j
FROM points_new
WHERE st_within(shape, create_envelope(@lat, @lon, 10))
 and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
 and name is not null
 and st_distance_sphere(shape, @p) < 1000;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: points_new
   partitions: NULL
         type: range
possible_keys: SHAPE
          key: SHAPE
      key_len: 34
          ref: NULL
         rows: 665
     filtered: 18.89
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

And if I remove “st_within(shape, create_envelope(@lat, @lon, 10))” from the query it will show the full table scan:

mysql> explain SELECT CONCAT('{
 "type": "FeatureCollection",
 "features": [
 ',
 GROUP_CONCAT('{
 "type": "Feature",
 "geometry": ', ST_AsGeoJSON(shape), ',
 "properties": {"distance":', st_distance_sphere(shape, @p) , ', "name":"', name , '"}
 }' order by st_distance_sphere(shape, @p)),
 ']
}') as j
FROM points_new
WHERE (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
 and name is not null
 and st_distance_sphere(shape, @p) < 1000;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: points_new
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 11368798
     filtered: 18.89
        Extra: Using where
1 row in set, 1 warning (0.00 sec)