Geo-Spatial Indexes

ArangoDB features a Google S2 based geospatial index since version 3.4.0, which supersedes the previous geo index implementation. Indexing is supported for a subset of the GeoJSON geometry types as well as simple latitude longitude pairs.

AQL’s geospatial functions and GeoJSON constructors are described in Geo functions.

Using a Geo-Spatial Index

The geospatial index supports containment and intersection queries for various geometric 2D shapes. You should be mainly using AQL queries to perform these types of operations. The index can operate in two different modes, depending on if you want to use the GeoJSON data-format or not. The modes are mainly toggled by using the geoJson field when creating the index.

This index assumes coordinates with the latitude between -90 and 90 degrees and the longitude between -180 and 180 degrees. A geo index will ignore all documents which do not fulfill these requirements.

GeoJSON Mode

To create an index in GeoJSON mode execute:

collection.ensureIndex({ type: "geo", fields: [ "geometry" ], geoJson:true })

This creates the index on all documents and uses geometry as the attributed field where the value is either a Geometry Object or a coordinate array. The array must contain at least two numeric values with longitude (first value) and the latitude (second value). This corresponds to the format described in RFC 7946 Position.

All documents, which do not have the attribute path or have a non-conform value in it, are excluded from the index.

A geo index is implicitly sparse, and there is no way to control its sparsity. In case that the index was successfully created, an object with the index details, including the index-identifier, is returned.

Non-GeoJSON mode

This index mode exclusively supports indexing on coordinate arrays. Values that contain GeoJSON or other types of data will be ignored. In the non-GeoJSON mode the index can be created on one or two fields.

The following examples will work in the arangosh command shell.

To create a geo-spatial index on all documents using latitude and longitude as separate attribute paths, two paths need to be specified in the fields array:

collection.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] })

The first field is always defined to be the latitude and the second is the longitude. The geoJson flag is implicitly false in this mode.

Alternatively you can specify only one field:

collection.ensureIndex({ type: "geo", fields: [ "location" ], geoJson:false })

It creates a geospatial index on all documents using location as the path to the coordinates. The value of the attribute has to be an array with at least two numeric values. The array must contain the latitude (first value) and the longitude (second value).

All documents, which do not have the attribute path(s) or have a non-conforming value in it, are excluded from the index.

A geo index is implicitly sparse, and there is no way to control its sparsity. In case that the index was successfully created, an object with the index details, including the index-identifier, is returned.

In case that the index was successfully created, an object with the index details, including the index-identifier, is returned.

Indexed GeoSpatial Queries

The geospatial index supports a variety of AQL queries, which can be built with the help of the geo utility functions. There are three specific geo functions that can be optimized, provided that they are used correctly: GEO_DISTANCE, GEO_CONTAINS, GEO_INTERSECTS. Additionally, there is a built-in support to optimize the older geo functions DISTANCE, NEAR and WITHIN (the last two only if they are used in their 4 argument version, without distanceName).

When in doubt whether your query is being properly optimized, check the AQL explain output to check for index usage.

Query for Results near Origin (NEAR type query)

A basic example of a query for results near an origin point:

FOR x IN geo_collection
  FILTER GEO_DISTANCE([@lng, @lat], x.geometry) <= 100000
  RETURN x._key

The first parameter can be a GeoJSON object or a coordinate array in [longitude, latitude] ordering. The second parameter is the document field on which the index was created. The function GEO_DISTANCE always returns the distance in meters, so will receive results up until 100km.

Query for Sorted Results near Origin (NEAR type query)

A basic example of a query for the 1000 nearest results to an origin point (ascending sorting):

FOR x IN geo_collection
  SORT GEO_DISTANCE([@lng, @lat], x.geometry) ASC
  LIMIT 1000
  RETURN x._key

The first parameter can be a GeoJSON object or a coordinate array in [longitude, latitude] ordering. The second parameter is the documents field on which the index was created.

You may also get results farthest away (distance sorted in descending order):

FOR x IN geo_collection
  SORT GEO_DISTANCE([@lng, @lat], x.geometry) DESC
  LIMIT 1000
  RETURN x._key

Query for Results within Distance

A query which returns documents at a distance of 1km or farther away, up to 100km from the origin. This will return the documents with a GeoJSON value that is located in the specified search annulus.

FOR x IN geo_collection
  FILTER GEO_DISTANCE([@lng, @lat], x.geometry) <= 100000
  FILTER GEO_DISTANCE([@lng, @lat], x.geometry) >= 1000
  RETURN x

Query for Results contained in Polygon

A query which returns documents whose stored geometry is contained within a GeoJSON Polygon.

LET polygon = GEO_POLYGON([[[60,35],[50,5],[75,10],[70,35]]])
FOR x IN geo_collection
  FILTER GEO_CONTAINS(polygon, x.geometry)
  RETURN x

The first parameter of GEO_CONTAINS must be a polygon. Other types are not valid. The second parameter must contain the document field on which the index was created.

Query for Results Intersecting a Polygon

A query which returns documents with an intersection of their stored geometry and a GeoJSON Polygon.

LET polygon = GEO_POLYGON([[[60,35],[50,5],[75,10],[70,35]]])
FOR x IN geo_collection
  FILTER GEO_INTERSECTS(polygon, x.geometry)
  RETURN x

The first parameter of GEO_INTERSECTS must be a polygon. Other types are not valid. The second parameter must contain the document field on which the index was created.

GeoJSON

GeoJSON is a geospatial data format based on JSON. It defines several different types of JSON objects and the way in which they can be combined to represent data about geographic shapes on the earth surface. GeoJSON uses a geographic coordinate reference system, World Geodetic System 1984 (WGS 84), and units of decimal degrees.

Internally ArangoDB maps all coordinates onto a unit sphere. Distances are projected onto a sphere with the Earth’s Volumetric mean radius of 6371 km. ArangoDB implements a useful subset of the GeoJSON format (RFC 7946). Feature Objects and the GeometryCollection type are not supported. Supported geometry object types are:

  • Point
  • MultiPoint
  • LineString
  • MultiLineString
  • Polygon
  • MultiPolygon

Point

A GeoJSON Point is a position comprised of a longitude and a latitude:

{
  "type": "Point",
  "coordinates": [100.0, 0.0]
}

MultiPoint

A GeoJSON MultiPoint is an array of positions:

{
  "type": "MultiPoint",
  "coordinates": [
    [100.0, 0.0],
    [101.0, 1.0]
  ]
}

LineString

A GeoJSON LineString is an array of two or more positions:

{
  "type": "LineString",
  "coordinates": [
    [100.0, 0.0],
    [101.0, 1.0]
  ]
}

MultiLineString

A GeoJSON MultiLineString is an array of LineString coordinate arrays:

{
  "type": "MultiLineString",
  "coordinates": [
    [
      [100.0, 0.0],
      [101.0, 1.0]
    ],
    [
      [102.0, 2.0],
      [103.0, 3.0]
    ]
  ]
}

Polygon

A GeoJSON Polygon consists of a series of closed LineString objects (ring-like). These Linear Ring objects consist of four or more vertices with the first and last coordinate pairs being equal. Coordinates of a Polygon are an array of linear ring coordinate arrays. The first element in the array represents the exterior ring. Any subsequent elements represent interior rings (holes within the surface).

  • A linear ring may not be empty, it needs at least three distinct coordinates
  • Within the same linear ring consecutive coordinates may be the same, otherwise (except the first and last one) all coordinates need to be distinct
  • A linear ring defines two regions on the sphere. ArangoDB will always interpret the region of smaller area to be the interior of the ring. This introduces a practical limitation that no polygon may have an outer ring enclosing more than half the Earth’s surface

No Holes:

{
  "type": "Polygon",
    "coordinates": [
    [
      [100.0, 0.0],
      [101.0, 0.0],
      [101.0, 1.0],
      [100.0, 1.0],
      [100.0, 0.0]
    ]
  ]
}

With Holes:

  • The exterior ring should not self-intersect.
  • The interior rings must be contained in the outer ring
  • No two rings can cross each other, i.e. no ring may intersect both the interior and exterior face of another ring
  • Rings cannot share edges, they may however share vertices
  • No ring may be empty
  • Polygon rings should follow the right-hand rule for orientation (counterclockwise external rings, clockwise internal rings).
{
  "type": "Polygon",
  "coordinates": [
    [
      [100.0, 0.0],
      [101.0, 0.0],
      [101.0, 1.0],
      [100.0, 1.0],
      [100.0, 0.0]
    ],
    [
      [100.8, 0.8],
      [100.8, 0.2],
      [100.2, 0.2],
      [100.2, 0.8],
      [100.8, 0.8]
    ]
  ]
}

MultiPolygon

A GeoJSON MultiPolygon consists of multiple polygons. The “coordinates” member is an array of Polygon coordinate arrays.

  • Polygons in the same MultiPolygon may not share edges, they may share coordinates
  • Polygons and rings must not be empty
  • A linear ring defines two regions on the sphere. ArangoDB will always interpret the region of smaller area to be the interior of the ring. This introduces a practical limitation that no polygon may have an outer ring enclosing more than half the Earth’s surface
  • Linear rings must follow the right-hand rule for orientation (counterclockwise external rings, clockwise internal rings).

Example with two polygons, the second one with a hole:

{
    "type": "MultiPolygon",
    "coordinates": [
        [
            [
                [102.0, 2.0],
                [103.0, 2.0],
                [103.0, 3.0],
                [102.0, 3.0],
                [102.0, 2.0]
            ]
        ],
        [
            [
                [100.0, 0.0],
                [101.0, 0.0],
                [101.0, 1.0],
                [100.0, 1.0],
                [100.0, 0.0]
            ],
            [
                [100.2, 0.2],
                [100.2, 0.8],
                [100.8, 0.8],
                [100.8, 0.2],
                [100.2, 0.2]
            ]
        ]
    ]
}

arangosh Examples

ensures that a geo index exists collection.ensureIndex({ type: "geo", fields: [ "location" ] })

Creates a geospatial index on all documents using location as the path to the coordinates. The value of the attribute has to be an array with at least two numeric values. The array must contain the latitude (first value) and the longitude (second value).

All documents, which do not have the attribute path or have a non-conforming value in it, are excluded from the index.

A geo index is implicitly sparse, and there is no way to control its sparsity.

The index does not provide a unique option because of its limited usability. It would prevent identical coordinates from being inserted only, but even a slightly different location (like 1 inch or 1 cm off) would be unique again and not considered a duplicate, although it probably should. The desired threshold for detecting duplicates may vary for every project (including how to calculate the distance even) and needs to be implemented on the application layer as needed. You can write a Foxx service for this purpose and make use of the AQL geo functions to find nearby coordinates supported by a geo index.

In case that the index was successfully created, an object with the index details, including the index-identifier, is returned.

To create a geo index on an array attribute that contains longitude first, set the geoJson attribute to true. This corresponds to the format described in RFC 7946 Position

collection.ensureIndex({ type: "geo", fields: [ "location" ], geoJson: true })

To create a geo-spatial index on all documents using latitude and longitude as separate attribute paths, two paths need to be specified in the fields array:

collection.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] })

In case that the index was successfully created, an object with the index details, including the index-identifier, is returned.

Examples

Create a geo index for an array attribute:

arangosh> db.geo.ensureIndex({ type: "geo", fields: [ "loc" ] });
Show execution results
Hide execution results
{ 
  "bestIndexedLevel" : 17, 
  "fields" : [ 
    "loc" 
  ], 
  "geoJson" : false, 
  "id" : "geo/76775", 
  "isNewlyCreated" : true, 
  "maxNumCoverCells" : 8, 
  "name" : "idx_1707084125476749312", 
  "sparse" : true, 
  "type" : "geo", 
  "unique" : false, 
  "worstIndexedLevel" : 4, 
  "code" : 201 
}

Create a geo index for an array attribute:

arangosh> db.geo2.ensureIndex({ type: "geo", fields: [ "location.latitude", "location.longitude" ] });
Show execution results
Hide execution results
{ 
  "bestIndexedLevel" : 17, 
  "fields" : [ 
    "location.latitude", 
    "location.longitude" 
  ], 
  "geoJson" : false, 
  "id" : "geo2/76786", 
  "isNewlyCreated" : true, 
  "maxNumCoverCells" : 8, 
  "name" : "idx_1707084125479895041", 
  "sparse" : true, 
  "type" : "geo", 
  "unique" : false, 
  "worstIndexedLevel" : 4, 
  "code" : 201 
}

Use geo index with AQL SORT statement:

arangosh> db.geoSort.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] });
arangosh> for (i = -90;  i <= 90;  i += 10) {
........>     for (j = -180; j <= 180; j += 10) {
........>         db.geoSort.save({ name : "Name/" + i + "/" + j, latitude : i, longitude : j });
........>     }
........> }
arangosh> var query = "FOR doc in geoSort SORT DISTANCE(doc.latitude, doc.longitude, 0, 0) LIMIT 5 RETURN doc"
arangosh> db._explain(query, {}, {colors: false});
Show execution results
Hide execution results
{ 
  "bestIndexedLevel" : 17, 
  "fields" : [ 
    "latitude", 
    "longitude" 
  ], 
  "geoJson" : false, 
  "id" : "geoSort/78218", 
  "isNewlyCreated" : true, 
  "maxNumCoverCells" : 8, 
  "name" : "idx_1707084125733650432", 
  "sparse" : true, 
  "type" : "geo", 
  "unique" : false, 
  "worstIndexedLevel" : 4, 
  "code" : 201 
}
Query String (86 chars, cacheable: true):
 FOR doc in geoSort SORT DISTANCE(doc.latitude, doc.longitude, 0, 0) LIMIT 5 RETURN doc

Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  7   IndexNode        703     - FOR doc IN geoSort   /* geo index scan */    
  5   LimitNode          5       - LIMIT 0, 5
  6   ReturnNode         5       - RETURN doc

Indexes used:
 By   Name                      Type   Collection   Unique   Sparse   Selectivity   Fields                        Ranges
  7   idx_1707084125733650432   geo    geoSort      false    true             n/a   [ `latitude`, `longitude` ]   (GEO_DISTANCE([ 0, 0 ], [ doc.`longitude`, doc.`latitude` ]) < "unlimited")

Optimization rules applied:
 Id   RuleName
  1   geo-index-optimizer
  2   remove-unnecessary-calculations-2

Optimization rules with highest execution times:
 RuleName                                    Duration [s]
 geo-index-optimizer                              0.00002
 reduce-extraction-to-projection                  0.00000
 replace-function-with-index                      0.00000
 use-indexes                                      0.00000
 remove-unnecessary-calculations-2                0.00000

41 rule(s) executed, 1 plan(s) created


arangosh> db._query(query);
[ 
  { 
    "_key" : "78924", 
    "_id" : "geoSort/78924", 
    "_rev" : "_cuv9dI2--A", 
    "name" : "Name/0/0", 
    "latitude" : 0, 
    "longitude" : 0 
  }, 
  { 
    "_key" : "78998", 
    "_id" : "geoSort/78998", 
    "_rev" : "_cuv9dJq--A", 
    "name" : "Name/10/0", 
    "latitude" : 10, 
    "longitude" : 0 
  }, 
  { 
    "_key" : "78926", 
    "_id" : "geoSort/78926", 
    "_rev" : "_cuv9dI6---", 
    "name" : "Name/0/10", 
    "latitude" : 0, 
    "longitude" : 10 
  }, 
  { 
    "_key" : "78850", 
    "_id" : "geoSort/78850", 
    "_rev" : "_cuv9dI---_", 
    "name" : "Name/-10/0", 
    "latitude" : -10, 
    "longitude" : 0 
  }, 
  { 
    "_key" : "78922", 
    "_id" : "geoSort/78922", 
    "_rev" : "_cuv9dI2--_", 
    "name" : "Name/0/-10", 
    "latitude" : 0, 
    "longitude" : -10 
  } 
]
[object ArangoQueryCursor, count: 5, cached: false, hasMore: false]

Use geo index with AQL FILTER statement:

arangosh> db.geoFilter.ensureIndex({ type: "geo", fields: [ "latitude", "longitude" ] });
arangosh> for (i = -90;  i <= 90;  i += 10) {
........>     for (j = -180; j <= 180; j += 10) {
........>         db.geoFilter.save({ name : "Name/" + i + "/" + j, latitude : i, longitude : j });
........>     }
........> }
arangosh> var query = "FOR doc in geoFilter FILTER DISTANCE(doc.latitude, doc.longitude, 0, 0) < 2000 RETURN doc"
arangosh> db._explain(query, {}, {colors: false});
Show execution results
Hide execution results
{ 
  "bestIndexedLevel" : 17, 
  "fields" : [ 
    "latitude", 
    "longitude" 
  ], 
  "geoJson" : false, 
  "id" : "geoFilter/76797", 
  "isNewlyCreated" : true, 
  "maxNumCoverCells" : 8, 
  "name" : "idx_1707084125485137920", 
  "sparse" : true, 
  "type" : "geo", 
  "unique" : false, 
  "worstIndexedLevel" : 4, 
  "code" : 201 
}
Query String (89 chars, cacheable: true):
 FOR doc in geoFilter FILTER DISTANCE(doc.latitude, doc.longitude, 0, 0) < 2000 RETURN doc

Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  6   IndexNode        703     - FOR doc IN geoFilter   /* geo index scan */    
  5   ReturnNode       703       - RETURN doc

Indexes used:
 By   Name                      Type   Collection   Unique   Sparse   Selectivity   Fields                        Ranges
  6   idx_1707084125485137920   geo    geoFilter    false    true             n/a   [ `latitude`, `longitude` ]   (GEO_DISTANCE([ 0, 0 ], [ doc.`longitude`, doc.`latitude` ]) < 2000)

Optimization rules applied:
 Id   RuleName
  1   geo-index-optimizer
  2   remove-unnecessary-calculations-2

Optimization rules with highest execution times:
 RuleName                                    Duration [s]
 geo-index-optimizer                              0.00002
 reduce-extraction-to-projection                  0.00000
 replace-function-with-index                      0.00000
 use-indexes                                      0.00000
 optimize-subqueries                              0.00000

41 rule(s) executed, 1 plan(s) created


arangosh> db._query(query);
[ 
  { 
    "_key" : "77503", 
    "_id" : "geoFilter/77503", 
    "_rev" : "_cuv9c6K---", 
    "name" : "Name/0/0", 
    "latitude" : 0, 
    "longitude" : 0 
  } 
]
[object ArangoQueryCursor, count: 1, cached: false, hasMore: false]