About Geocoding

A geocoded table is a table where every record has a location given in latitude and longitude, using standard decimal degrees notation for the latitudes and longitudes. Unless each record is geocoded with a latitude and longitude location Manifold cannot know where that record is located. Once a table is geocoded it may be used to create drawings.

 

images\sc_nongeocoded.gif

 

For example, the table above is not geocoded. It lists the names of towns in the United States but there is no way to tell from the table exactly where the towns are located. If we were to try to draw points on a paper map for each town we would not know where to place the points. If a table is not geocoded, it cannot be used to create a drawing in Manifold either, because Manifold also would not know where to put the points.

 

images\sc_tv_editcoords03.gif

 

In contrast the above table is geocoded. Each record now has a latitude and longitude location given in decimal degrees notation. We could use the latitude and longitude values to draw a point for each town on a paper map of the United States. Manifold could also use this table to create a drawing.

 

images\sc_tv_editcoords02.gif

 

If a table is geocoded it can be used to create a drawing, which in turn can be used in a map like the illustration above. Right away, the positions of the towns convey an immediate visual impression of their locations that one does not get in a table presentation. The display of data at the right location is part of the great power of a GIS like Manifold, so we would like our data to be geocoded so the data can immediately be displayed within a map.

 

The problem is that a lot of the important data sets we deal with, whether they are lists of customer addresses or lists of oil wells or lists of fire hydrants that need maintenance, are not geocoded. The central problem for many GIS users is getting their data geocoded. Depending on the contents of the table, geocoding the table can be a reasonably straightforward process or it can be very difficult or even impossible.

 

Let's take a look at three geocoding tasks to see what approaches to geocoding are possible in different cases. We will look at geocoding a table of towns, geocoding a table of fire hydrants and geocoding a table of street addresses.

 

Geocoding a Table of Towns

 

Suppose we have a table with town names like the first example above. How can we geocode it? In the simplest case we look up the latitude and longitude of each town in a reference book or atlas and we add the latitude and longitude to the table for each record by hand. A better way is to use the power of Manifold to automate the process.

 

If we have a table somewhere that already lists latitudes and longitudes for towns we could extract information from that table automatically. Because there are many geocoded tables of populated places that are easily obtained by free download from the Internet it is usually a straightforward matter to geocode a table of place names.

 

There are three techniques that are most frequently used:

 

·      If we have another database table that contains the town name and latitude and longitude we could use SQL facilities such as Union to combine the two tables via a relation using a key field such as the name of the town.

·      If we have a drawing that shows points for cities in the United States (such as a drawing of populated places) we can geocode the table using the drawing as a guide with Manifold's spatial Match tool. See the Spatial Geocoding with Match topic for more on this tool.

·      If we have the Manifold Geocoding Tools package installed, we can geocode using the town name by following the fast and simple procedures in the Street Address Geocoding topic.

 

Geocoding a Table of Fire Hydrants

 

Suppose our town would like to create a GIS database of all fire hydrants in the town. We plan to use the power of GIS to help keep track of the status of all fire hydrants and to help plan regular maintenance, cleaning, flushing of water systems and so on. Let's say we have inherited a database of fire hydrants that provides an identification number for each hydrant, some status information on the hydrant and a "location" field that consist of a text comment noting what street the hydrant is on and what is the nearest crossing street. Our task is to geocode the table with the latitude and longitude location of each hydrant.

 

In the United States, the simplest way to accomplish this task is to connect a portable, WAAS-enabled GPS device to a laptop loaded with Manifold, turn on the GPS Console and then drive to the location of each fire hydrant. With Instant Data turned on we would place a point at the location of each fire hydrant and write down the identification number of that hydrant. The result of this process will be a map of points where each point is the location of a single fire hydrant. In addition to the object ID field, the drawing's table will have only one data attribute field in it, the identification number of the hydrant. We can then use this drawing together with Manifold's Match tool to geocode the hydrants database table using the identification number fields as key fields.

 

Although recording the locations of fire hydrants in an entire town in this way requires a substantial amount of driving, the process goes very rapidly when the GPS Console and Instant Data are used. WAAS-enabled GPS devices can achieve 2-meter (about 6 feet) accuracy, which is sufficient accuracy to locate fire hydrants.

 

Unfortunately, WAAS is generally available within the United States only. In regions outside the United States GPS devices will provide only 15-meter (about 50 feet) accuracy by default, which many people would not consider sufficient accuracy for mapping fire hydrants. 15-meter accuracy may be fine for locating bridges, which are large structures that are easily found when one is positioned within 15 meters of them, but in the case of smaller objects such as fire hydrants, especially if they are to be placed on digital maps in relation to features such as buried pipes, one normally would like better accuracy.

 

One way to accomplish this geocoding task outside of the United States would be to drive the city streets and manually mark on a paper map the locations of all hydrants. We could then scan the paper map, georegister the resultant image and then use Tracing to create a drawing of points that show the location of each hydrant. We could then enter the identification number for each hydrant into a data field for each point and then once again use Match to geocode the database table using the identification number as a key field.

 

Another alternative might be to acquire an aerial photograph of sufficient resolution that hydrants are visible, to scan in the photograph, georegister it and then use tracing to create a drawing of hydrant points and Match to geocode based on the drawing. Although overhead photography is probably not very practical in the case of fire hydrants (which would be obscured by trees in many cities) it is a very practical way of geocoding other infrastructure items, such as bridges or electrical transmission towers. Panning and zooming within a map that contains an image layer and then clicking to create a point at various locations in an overlying drawing is a very fast process.

 

Note that the task of geocoding a table of fire hydrants is directly analogous to the task of geocoding a table of oil wells, a table of monitoring stations in a forest or, for that matter, any table of items whose location is not known. In all such cases we must determine the latitude and longitude location of each item by either physically measuring the latitude and longitude with a GPS, by marking the location accurately on a map or by determining the locations using an aerial photograph. If the items to be geocoded are easy to reach and a GPS is available the geocoding process might be very straightforward. If they are far away and there is no aerial photograph or other map that can be used, then it could well be impossible to find their locations and thus geocode the table.

 

Geocoding a Table of Street Addresses

 

If we had a table of street addresses like the one below we could not plot these on a map because the table is not yet geocoded. Without a latitude and longitude location for each record we would not know where to place it on a map.

 

images\tbl_sushi_addresses_01.gif

 

The ability to find a latitude and longitude location for a given street address is called street address geocoding. Without a latitude and longitude location for a street address, no GIS package knows where that address really is.

 

It is easy to make the conceptual mistake of thinking of a street location as being an exactly defined location, the same as a latitude/longitude location. However, that mistake arises mainly from how people use addresses to find locations for the delivery of mail or to go to a particular restaurant or other location. Street addresses, of course, do not really convey an exact latitude and longitude for the address. They simply provide a means by which a postal carrier or someone else physically traversing the streets can find a particular address.

 

To find an address we have to find the street (with the help of a map if we don't know a particular town), orient ourselves to the address system used on that street and then locate the address. As anyone who has tried to find an out-of-sequence address in an unfamiliar town knows, there is a great difference between hunting down a particular street address and going directly to a latitude/longitude location.

 

images\dwg_sushi_addresses_01.gif

 

It is one thing to be able to find a given street address by physically going there (perhaps with the help of a local street map) and it is quite another thing to plot a table of street addresses, such as the table of restaurant addresses, on a map as seen above without ever going to the actual address. To plot each restaurant shown in the table we need to know the actual latitude and longitude address at which it is located. To do that, the table must be geocoded as seen below.

 

images\tbl_sushi_addresses_02.gif

 

In recent years the adoption of geocoding technology by consumer computer applications (at least in many First World nations) has also encouraged us to think of street addresses as being equivalent to a latitude/longitude location for the purpose of computer mapping. Internet mapping sites allow us to enter a street address, such as "525 Main Street, Central City," and instantly see a street map with the location of the address marked as if we had provided an exact latitude and longitude location. Low cost navigation systems that combine GPS technology with built in maps and street address geocoding systems allow us to specify a street address and navigate directly to that location, again, as if we had given exact latitude/longitude coordinates for our desired destination.

 

As a result, it is quite common for people to expect to be able to enter a street address into a web site or a map and to see a physical location for that address, a sort of "geocoding on the fly." Some applications may give the appearance of taking a list of addresses and displaying them straightaway as points in a map; however, in all cases the software will internally take the intermediate step of using the address to determine a latitude and longitude location for the record. The latitude and longitude location is then used to plot the location of the point.

 

Software packages use many different strategies to geocode street addresses into latitude and longitude locations. The basic approach is to maintain a large database of streets and address ranges so that the location of a particular address can be estimated from the database. Software that can perform street address geocoding may be built into a GIS package, it may be sold as separate geocoding software, or it may be provided as an Internet web service.

 

If we install the Manifold Geocoding Tools package, we can turn on the street address geocoding capability that is built into Manifold. The Manifold street address geocoding engine becomes functional when the Geocoding Tools package is installed and a geocoding data source is available, such as the Manifold Geocoding Database for US streets that is provided on the Manifold downloads site. If we have installed Geocoding Tools and a geocoding data source we can use Manifold to geocode a table of valid US street addresses with the approximate position of each address.

 

To geocode street addresses outside the US, other geocoding data sources may be used. For example, Microsoft's MapPoint product may be used as a geocoding data source for addresses in Canada or in various European countries. See the Geocoding Data Sources topic.

 

How Street Address Geocoding Works

 

Any geocoding software (including Manifold) that provides street address geocoding must find the street address and an equivalent location in a database. Unfortunately, there are no global databases that provide an accurate location for each street address although it is possible through special means to create local databases that have accurate locations for street addresses.

 

To take the United States for example, there is no national database that specifies exactly where all addresses are located. This is mainly because addresses in the US are highly irregular, are poorly documented and change too rapidly for either private companies or government agencies to be able to keep up with perfect accuracy. Therefore, when using any street address geocoder it is important to understand that the output of the geocoder is almost certain an approximate location and not an exact location.

 

The closest approximations to a national database of address locations that exist in the United States are the U.S. Bureau of the Census TIGER database and the TIGER/Line data sets derived from TIGER. TIGER/Line attempts to show known roads with address ranges for each road segment. Actual addresses are not noted, but are represented only as a best effort at showing the address range (from lowest to highest address number) that occurs in a particular street segment. Most geocoding software in the United States, including Manifold, uses databases that are derived in some way from the TIGER/Line data sets.

 

Based on data sets like those created by the Census Bureau, geocoding software can be created that compares a record's address, such as 525 Main Street, Central City, Idaho, 01120 to an internal database of street segment coordinates and address ranges for each segment. For example, after zeroing in to Main Street by using State, ZIP code and City fields, the software can find the right Main Street segment that contains the address range for the address number at hand. If one particular Main Street segment has a high value of 600 and a low value of 500 for the address range on that segment, the software could then reasonably infer that 525 Main Street is located about one fourth of the way up that particular street segment. It could then assign the latitude and longitude of that interpolated spot to the record.

 

It is important to understand that the geocoding software has no idea where the actual address is located. It simply interpolates the location of the address by making what is hopefully a reasonable guess based on the address range recorded for a given street segment. Clever software can use a variety of strategies to make better guesses, but at the end of the day the results are usually accurate to only within a city block in urban areas and are wildly inaccurate in rural areas. Addresses of the form Rural Route 10 Box 82, for example, in a rural area might not be geocodable to within tens of miles if they are geocodable at all.

 

Note that the above method for creating and using geocoding databases does not store the actual locations of specific street addresses. It only says that if there are any street addresses on, say, this particular street segment of Main Street they would fall into the range between 100 and 200 Main Street. If we ask the system to geocode 150 Main Street the system doesn't question our implied assertion that there really is an address of that nature, it simply takes it on faith that if we are asking to geocode such an address we want the system to report what the computed location should be and so the system plops a point half way up that segment of Main Street.

 

The geocoding database does not actually know there are any addresses on that segment of Main Street or, if there are any, what those addresses might be. It simply knows that if there are any addresses on that segment of Main Street those addresses will be between 100 and 200. As far as the database is concerned there could be empty fields on that segment. Because the database is structured in this way it is not possible to reverse geocode using segment and address range data.

 

Reverse geocoding is the process of finding all addresses that are within a given range of a given location. Reverse geocoding can only be accomplished with any sort of accuracy if the database in use is a points of interest style database that contains not a database of segments with an address range on each segment but rather contains a database of points with the location of each point and the specific address for each point. Such databases also allow precise accuracy for each address instead of an estimated, interpolated location for geocoded addresses.

 

Creating geocoding software that can accurately assign an exact, non-interpolated location for each individual address requires a database of all addresses and their exact latitude and longitude locations, that is, the points of interest style of geocoding database. To support 911 service and other emergency response services, some towns are using GPS equipment to create precise databases that show the exact location of each address in their town. Manifold can use such user-provided data sets for precise address geocoding and for reverse geocoding. See the Geocoding Data Extensions topic.

 

For information on using Manifold's street address geocoding system, see the Street Address Geocoding topic.

 

Street Address Geocoding Outside the United States

 

Unfortunately, the United States is the only country that places large government databases of street address ranges like TIGER/Line into the public domain. In other countries, acquiring a database that shows streets and address ranges for those streets is very costly, and in many cases not possible.

 

As a result, there are many fewer choices for street address geocoding software outside of the United States. Because the Manifold Geocoding Database is based on public domain government data, it is possible for Manifold to provide it for no additional cost on the Manifold downloads site. Because there is no public domain data for streets and address ranges outside the United States, Manifold provides no geocoding databases for locations outside the United States. However, Manifold includes an option to use Microsoft's MapPoint product as a geocoding data source for addresses in Canada or in eleven European countries. See the Geocoding with MapPoint topic.

 

Another way to do geocoding outside of the US is to create our own point location geocoding database and then use that as a geocoding data source. A fast way of creating such a point location geocoding database is to use Manifold's GPS Console with a portable computer. Two people ride in a vehicle with one person driving and the other person operating the computer. They drive to each address and the computer operator uses Instant Data to add a point at that location with the exact address. This can be a very fast process, so fast that an experienced, determined team of two people in a single vehicle can geocode thousands of addresses per day. A town of 50,000 addresses can usually be geocoded in well under a week's time.

 

The resultant point location database can then be used for geocoding with Manifold. See the Geocoding Data Extensions topic.

 

Geocoded Tables use Decimal Degree Notation

 

Geocoded tables in Manifold must have valid latitude and longitude fields consisting of degrees from 0 to +/- 180 longitude and 0 to +/- 90 latitude, with partial degrees denoted as decimal fractions. A minus sign denotes West longitudes and South latitudes. This style of writing latitudes and longitudes is called decimal degrees.

 

Like all modern GIS packages, Manifold uses decimal degree notation because it is an unambiguous standard that is well suited for arithmetic operations and can be written in database tables as text fields or numeric fields. Older methods of writing latitudes and longitudes, such as the use of the letters "E", "W", "N" and "S" or the use of degrees, minutes and seconds notation are not well standardized and involve clumsy notation that is not very useful in computing operations. Manipulating values such as East 32 42' 15" is somewhat akin to trying to do longhand multiplication using Roman numerals… not very efficient or sensible.

 

In modern times most databases of geocoded information use decimal degrees. However, over the years there have been many different styles used to write latitude and longitudes in database tables. Older tables might use text fields to express coordinates in the form of degrees, minutes and seconds, for example. Other tables may use degrees, minutes and decimal fractions of degrees. Some tables will denote longitudes in degrees from 0 to 360. Others might use text strings and prefix a letter, such as "N", "S", "E" and "W" to indicate North or South latitudes and East or West longitudes.

 

Manifold's approach to dealing with such tables is to import them into Manifold where Transform toolbar operators and other tools can be used to convert coordinates into standard decimal degree notation. This allows the full power of Manifold tools to be brought to bear to adjust the table into the desired form. Clever use of token operations will allow transformation of any format into the desired decimal degrees. See, for example, the Using Tokens and Text Strings topic and the Extract Last Names using Tokens example.

 

If you have a table that has latitude and longitude values using some old-fashioned notation you should first translate those values into modern decimal degree notation. Only then is it safe to consider it a geocoded table.

 

Note: Although this documentation is written to require decimal degrees notation, in fact if coordinate columns in a geocoded table use degree - minute - seconds notation Manifold will try to parse the notation used to extract valid longitude and latitude values. manifold.net strongly recommends using decimal notation to avoid any possible ambiguity.

 

"Generic" Geocoding Strategies

 

Geocoding a table by specific addresses is often not required. Although it is easy to understand the conceptual appeal of adding an exact latitude and longitude position for each customer record by address, such geocoded tables also lay a conceptual trap for the unwary in that they are intrinsically inaccurate. Sometimes it is better to have an approximate table that does not lay claims to false accuracy. For many GIS purposes it may be enough to simply pin down a customer location to a specific ZIP code and not to a specific city block. This can be done using spatial geocoding with Manifold's match command. See the Spatial Geocoding with Match topic for spatial geocoding within Manifold.

 

By spatially geocoding tables using key fields we can often end up with a geocoded table that combines our desired records with a latitude and longitude position for each record. The classic example is displaying customer address records using their ZIP codes. If we have a drawing that shows a point for each ZIP code centroid we can merge the customer address table into this drawing using the ZIP code as a key field. In that case, customer records will appear as a point at the ZIP code centroid for their ZIP code. [The Manifold street address geocoding engine can also geocode addresses that consist only of ZIP codes as well, by geocoding the address to the ZIP code centroid, so as a practical matter there is no need to use Match to geocode to ZIP codes in the US if we have the Geocoding Tools package installed.]

 

This "generic" method of geocoding is often the only possible method of geocoding for international users who do not have access to a street address geocoding data source for their locations but who do have postal code or telephone code maps or other data sets that can be used as guides for spatial geocoding based on Match.

 

GIS Jargon and the "Geocoding" Word

 

New GIS users are sometimes worried about being heard using words in an atypical way that betrays their inexperience, so let's take a moment to discuss a fine point of nomenclature.

 

The word geocoding as used in this topic and generally used in the GIS world is a jargon or slang word that is reserved for the specific case of adding latitude and longitude values to records in a table in order to specify the geographic location of each record. People who are new to GIS or people who use their jargon in sloppy ways will at times use the word geocoding to refer to the general way GIS drawings keep track of the locations of things. That's a mistake that denies us the usefulness of having a handy slang word that means a single, precise thing.

 

GIS drawings contain geometry information that describes the shape and location of the objects they contain by recording the coordinates that define the shape and location of the objects. The geometry information is called the object metric (another handy slang word). It's true that in the case of single points the coordinates that define the location of a point are indeed a single pair of latitude/longitude values just like those that might appear in a geocoded table giving a list of records that are to appear as points. However, in the case of lines or areas there are many coordinates that define those objects. We don't therefore refer to the geometry of objects in drawings as being "geocoded."

 

Therefore, while a verbally-imprecise or inexpert user might say things like "I'm going to geocode those farm boundaries by tracing over an aerial photo," that's not really using the word "geocode" as it is normally expected to be used. One might say, "I'm going to geocode that table of addresses" but one would not normally talk about "geocoding" the shape of an area. Instead, one might say things like "I'm going to digitize those boundaries from a photo" or "I'm going to trace those boundaries..." or even perhaps (awkwardly) "I'm going to locate those boundaries."

 

Usually in drawings once the drawing is created if it needs to be cast into the correct geographic context we speak in terms of georegistering or georeferencing it (synonyms). The more subtle idea here is that it is not so much that we have chains of coordinates that correctly define the object metric but that we care about the important nuances of placing those coordinates correctly within an Earth coordinate system such as a projection and a specific location.

 

Our example of geocoding a table of fire hydrant locations brings us right to the edge of accepted use of the geocode word. If we speak in terms of adding latitude / longitude values to that table of locations then we can be righteous about using the term geocode to describe what we are doing. On the other hand, suppose we have a digital map with a layer of streets and we are creating points in the drawing where the hydrants are located: in that case, we would talk in terms of locating the hydrants on the map or digitizing hydrant locations. So we might say something like "I've got to digitize these hydrant locations from that scanned image so I can geocode that table."

 

Geocoding is normally something that is done with tables which are then used to create drawings, or something that is done with individual records so they can then be displayed in drawings. For example, a web site designer might say, "When the visitor provides a street address I need to geocode it so I can figure out what part of the map to show."

 

Coordinate Columns

 

The latitude and longitude columns used in geocoded tables are also called coordinate columns, because they give the latitude and longitude coordinates at which each record is to be located. A more general form of geocoded tables occurs if the columns are labeled X and Y or have some other column names and give coordinates other than simple latitude and longitude values.

 

For example, a geocoded table might have coordinate columns that give the actual X and Y coordinates that locate the record in some specific projection, that is, if some specific coordinate system other than simple Latitude / Longitude projection. We're getting ahead of the game for those readers who are going through this documentation in a straight progression of topics, because we have not yet read any of the projections topics. But for the time being, trust us on this: we can have coordinate columns other than simple latitude or longitude which give the X and Y values in some projection for each record. If so, that table also is a geocoded table.

 

Geocoded Tables and Geometry Columns

 

Geocoded tables are conceptually the easiest way for new users to understand how a table can store items that are to be displayed in a drawing. It's easy to think about: each record has a latitude and longitude location saved in some coordinate columns. Place a point at that location. The simplicity of this idea is why we introduce geocoded tables in the Introduction chapter. It is also why most GIS products have some way of working with geocoded tables.

 

But Manifold can do more, much more. A more sophisticated, more powerful and ultimately higher performance way of saving items within tables is to use geometry columns, which actually save the object metric in a column using some appropriately clever technology. Tables that can save the full object metric can have each record represent not just points but also lines or areas as well, no problem.

 

There are many possible ways of doing this, but there are surprisingly few GIS systems that can do it in a sophisticated way or even do it at all. In fact, there are so few systems that can do it at all that there has not even evolved a standardized slang term for such tables. Everyone in GIS knows what a geocoded table is, but for those tables that use geometry columns, well, what do you call them? There is no standard term.

 

At manifold.net we refer to such tables as geometry tables to make clear we are talking about tables that have one or more columns containing true geometry data as opposed to the very simple case of geocoded tables. We use the term geometry table regardless of the specific technology used to implement the geometry column. For example, we use the term geometry table to describe a table that stores geometry using OGC WKB "well known binary" form as well as to describe an Oracle Spatial table storing geometry using Oracle's high performance SDO_GEOMETRY data type.

 

Why should we bother to give such tables their own special name?

 

The reason is that the GIS world tends to divide into two classes of users and applications: those users who do relatively simple, straightforward things using geocoded tables and would like to do so as simply and in as straightforward a manner as possible, and those users who need to operate at a quantum leap higher in sophistication, typically higher end enterprise users, who do far more complex things using geometry tables and who want the full power of tools designed to work with such data.

 

To support users doing more simple and straightforward things Manifold provides a set of simplified dialogs and capabilities that let us work with geocoded tables in pretty much the same fashion that any simple GIS can do. These features make it convenient and easy to use geocoded tables in a wide range of applications, especially when we need to interface with our colleagues from the database world or web site design world who really don’t want to learn anything about GIS. This documentation includes topics that are dedicated to using geocoded tables without getting into more complex things.

 

In contrast, those users working with geometry tables can work to their heart's content and maximum level of expertise doing all the wonderful things made possible by geometry in tables. Those users who don't need the higher level of function provided by geometry tables can stick to using geocoded tables. We introduce the notion of geometry tables here so that users teleporting into other topics are not confused when they read about geometry in tables in addition to the simple case of geocoded tables.

 

Notes

 

The table of sushi restaurants lists sushi restaurants approximately within one mile (1.6 km) of the main USGS facility in Menlo Park. The map in the illustration shows the USGS facility as a yellow diamond and plots the restaurants as green dots. Once we get over our astonishment at the provinciality of a region that can support no more than eleven sushi restaurants per mile, we can see that the restaurants in Palo Alto in the lower right are more tightly clustered than those in Menlo Park, which tend to be spread out along a single main road.

 

The illustration should not be used for navigation (none of the illustrations in Help should be used for navigation!) since the nature of sushi restaurants is to come and go almost like the seasons. Even at the time of publication of this document several of the restaurants mentioned have disappeared and have been replaced by others.

 

See Also

 

Creating Drawings from Geocoded Tables

Geocoding

Geocoding Tools

Street Address Geocoding

Geocoding Data Sources

Geocoding with MapPoint

Geocoding Data Extensions

Manifold Geocoding Servers

Spatial Geocoding with Match

Create a Linked Drawing from a Geocoded Table

Create a Map from a Geocoded Table