Address Resolution using Pelias
Jun 24, 2022by, Swathy S
Geocoding is basically being able to give an address like the street address or the name of place and transforming that into geographic coordinates like a point (like latitude and longitude), which you can visually see on a map.
Reverse geocoding is the opposite of geocoding, which means you can get the human-readable address when you give the geographic coordinates for a location.
Pelias is a modular, open-source geocoding service over OpenStreetMap data. Using Elasticsearch, it gives results out in fast and accurate global search. It is built using Node.js and Elasticsearch. Pelias architecture has 3 main components and several smaller subcomponents.
Pelias Architecture Diagram
Pelias uses 5 data importers for getting the required data for geocoding. The database will contain filtered, normalised and ingest geographic datasets. Currently, there are five officially supported importers:
- Who’s on First
Who’s on First
- A tool which imports data into a Pelias ElasticSearch store.
- Who’s on First (WOF) is a gazetteer, which is basically a geographical dictionary.
- It supports two types of data: hierarchical data and venues.
- Hierarchy data represents things like cities, countries, counties, towns, etc.
- Venues represent individual places like the Taj Mahal, a bus station, etc. They are subdivided by country, and sometimes regions within a country.
- Pelias can compute the admin hierarchy (county, region, country, etc) from the imported data.
- This importer supports importing hundreds of millions of global addresses collected from various authoritative government sources by OpenAddresses.
- It is basically a global collection of address data sources, open and free to use.
- This data importer handles importing OpenStreetMap data into database for use by Pelias.
- It includes a filtering option for filtering out relevant data for geocoding. Then the data is transformed to match the data model of Pelias and augments it.
Indexing OpenStreetMap (OSM)
- Streets, Addresses, POLs are reverse geocoded into Quattroshapes using PostGIS.
- Streets are stored as a GeoJSON line string
- For addresses and POIs, the center point (GeoJSON point) is stored whether or not it’s a polygon.
- This data importer imports all road network data into the application from a list of encoded strings.
- The encoded line string is created by converting a series of coordinates to a single string. This conversion is done by using a lossy compression algorithm.
- The imported data is mainly from OpenStreetMap.
- This importer includes utilities for downloading and cleaning up the data before import.
- It is a geographical database that covers the whole world and has web services that let users extract information about different places.
- Pelias doesn’t use the admin hierarchy in OpenStreetMap, instead use an admin hierarchy from Geonames and reverse geocode OpenStreetMap into it.
Elasticsearch is the database used to power faster search results. The database will contain all the filtered datasets.
- An open-source, RESTful, broadly-distributable, readily-scalable full-text search engine built on Apache Lucene.
- It can power extremely fast searches due to its extensive API.
This is where the actual geocoding process happens, and it includes the components that users interact with when performing geocoding queries. The services are:-
API: The service gives back results in the format GeoJSON by querying Elasticsearch and using other Pelias services. The request can be done in HTTP also.
Placeholder: This service captures the relationship between administrative areas (city, state, country) and can also handle relational data very well where Elasticsearch fails.
Point-In-Polygon (PIP): The only component of Pelias that actually understands polygon geometries is PIP service, and it is very good at quickly determining which admin area polygons a given point lies in. The quickness of PIP calculations is considered important for reverse geocoding.
Libpostal: Pelias uses the libpostal project for parsing/normalizing street addresses using the power of machine learning. Go service built by the Who’s on First is used to make this happen quickly and efficiently.
Interpolation: This service knows all about addresses and streets. It is able to supplement the known addresses that are stored directly in Elasticsearch using the existing knowledge and return fairly accurate estimated address results for many more queries than would otherwise be possible using the initial data.
How PELIAS works?
It searches over millions of records by combining the fast full-text search techniques of Elasticsearch with knowledge of geography. Each of these records represent a location on Earth.
- Elasticsearch is a full-text search engine with geo support added and it offers autocomplete, which is very fast.
- For each document you wanted included in autocomplete, three things are stored:
- An array of possible imports like “1 Main Street NewYork”, what users would type although different ways that users might enter a query.
- The output string which is like the canonical string you want to suggest to the user for search.
- An optional JSON payload to be returned to the user with the output
- When we type the query in Pelias, the autocomplete shows suggestions and, we can select the results returned in autocomplete and then using the data returned in the payload to view the result directly on the map or we can use the output string to execute a full search as well.
- Population count is also weighed for better result.
If you would like to know more regarding Pelias and how we use it, click here.
Disclaimer: The opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Dexlock.