Spatial DBMS

This topic continues the discussion begun in the Data Storage Strategies topic. In this topic we look more closely at storing GIS data, in particular drawings, within database management system (DBMS) packages. This topic provides a general introduction to the Manifold approach to spatial DBMS. See the Spatial DBMS Facilities topic for detailed information on Manifold spatial DBMS capabilities, including information on supported spatial DBMS products.

 

The use of spatial DBMS technology is a great way to store and work with drawings and images of immense size with high performance. Manifold provides a wide range of capabilities to take advantage of spatial DBMS:

 

·      Manifold can store drawings within major spatial DBMS products (IBM DB2 with IBM Spatial Extender, Microsoft SQL Server 2008 spatial, Oracle and PostgreSQL/PostGIS) using the native spatial DBMS facilities within those products.

·      Manifold can store drawings, images and surfaces in almost any DBMS using Manifold-managed spatial DBMS capabilities.

·      The Manifold Spatial Extender for SQL Server 2005 provides especially fast Manifold-managed spatial DBMS for SQL Server 2005 users.

·      Manifold can store images and surfaces within Oracle DBMS products supporting Oracle native GeoRaster capability.

·      In addition to using Manifold or native spatial DBMS capabilities, Manifold can also connect to ESRI-style SDE geodatabases and Personal geodatabases and import and link (for read/write editing) drawings in SDE or Personal geodatabases.

 

DBMS products have become so powerful, so inexpensive and so easy to install that more and more GIS applications now take advantage of DBMS. GIS users from individual hobbyists to the largest enterprises can now benefit from spatial DBMS technology through Manifold.

 

Manifold Enterprise Edition or above is required to connect using native connections to spatial DBMS such as Oracle Spatial, DB2 with IBM Spatial Extender, SQL Server 2008 spatial and PostgreSQL / PostGIS. While Personal and Professional editions can make use of spatial DBMS in some cases once configuration has been done with Enterprise Edition, in general Enterprise Edition or above is required for spatial DBMS work.

 

Two Types of Data in Drawings

 

Most GIS packages, Manifold included, use two types of data to make up drawings:

 

·      One type of data is the geometric information that defines objects in the drawing. This type of geometric information is often simply called the geometry of the drawing. It specifies the shape and location of the points, lines and areas of which the drawing consists.

·      The second type of information is the data in the optional attributes that may be attached to objects in a drawing. For example, a drawing that consists of points showing the locations of cities might also have attributes that give the name of each city, the population of that city and other information.

 

GIS packages use geometry to draw the shapes and locations of points, lines and areas in a window. Attribute data is normally shown using a row and column table visual interface like that used in DBMS packages. A drawing may or may not have attributes for the objects it contains, but it always has geometry because without geometry there would be no points, lines or areas in the drawing.

 

Storing Attribute Data in a DBMS

 

A GIS package becomes more flexible, useful and powerful if it can work with attribute data that is stored in an external DBMS. That's very useful even if the GIS continues to manage all of the geometry.

 

Many GIS packages have this capability, although it is usually limited to read-only linkages or available only in a limited number of data access methods, such as ODBC. Manifold allows linking of attribute data from an external DBMS with full read / write / edit capabilities using a very wide range of data access methods. See the Relations topic and the Attaching External Tables to Drawings topic for Manifold techniques for using attribute data from external sources.

 

Linking attribute data from a DBMS brings a lot of advantages to GIS users. The foremost advantage is that there is much more data maintained in commercial DBMS products than exists in GIS storage, so being able to link data into GIS projects from an external DBMS gives access to a much wider range of data.

 

The second key advantage is the ability to dynamically update a GIS project using data that is maintained in some external DBMS. Suppose, for example, we want to graphically display the sales of our company in various regions by thematically changing the color of regions in accordance with a sales attribute for each region. If the sales attribute is automatically fetched into the drawing from our corporate DBMS, whenever that corporate DBMS is updated by some other process (such as, perhaps, by our company's order entry system), our drawing will automatically be updated as well.

 

A third advantage is that storing attribute data in a DBMS also can take advantage of ubiquitous data interchange between applications that can work with DBMS. If our attribute data is stored in a DBMS there are likely thousands of applications and utilities that can work with it without having to know anything about our GIS package. Applications can connect to the DBMS as they have long known how to do, while the GIS takes data as needed from the DBMS.

 

Storing Geometry Data in a DBMS

 

There are many benefits to storing attribute data in an external DBMS even though geometry continues to be managed exclusively within the GIS. But so long as the GIS manages geometry users will be subject to whatever limitations that may involve. There are usually several limitations imposed by having geometry managed exclusively by the GIS package.

 

The first limitation is that GIS packages usually do not have the sheer capacity and cluster scalability of mainstream DBMS packages like Oracle or SQL Server. As a result, the number of objects in a drawing will normally be limited by the performance of the GIS running on a single machine.

 

A second limitation is often a restriction to single users or single processes working with the geometry data stored by the GIS. Mainstream DBMS packages have evolved to meet intensely multiuser, multi-process needs but most GIS packages (Manifold is an exception) have not been built with the multi-process, transaction-oriented architecture required for intensively multiuser operations on geometry.

 

A third limitation is limited dynamic interoperability between different GIS packages and other applications. GIS data can be interchanged more or less successfully in a static, limited way using well known formats such as shapefiles, but cannot in general be interchanged dynamically as is taken for granted with DBMS servers.

 

A GIS that can store geometry in a DBMS can escape the above limitations. Storing geometry in a DBMS can take advantage of the capacity and scalability of the DBMS, resulting in drawings that can be terabytes in size. If a DBMS stores geometry, a GIS can operate as a client to the DBMS server in multi-user settings, taking advantage of the extensive apparatus a modern DBMS will provide to manage the simultaneous use of data by many different users. Storing geometry within the DBMS using data types native to the DBMS allows interchange with any application that understands those data types, which is usually many more applications than can parse the proprietary internal formats used for geometry storage within even very popular GIS products.

 

Manifold can work with geometry stored in databases using the widest and most flexible range of data access methods of any GIS system. In fact, Manifold can do so simultaneously with many different data sources and access methods, including geometry stored within Manifold itself.

 

Native Geometry and Non-Native Geometry Types

 

All data stored in a DBMS is ultimately in binary form. When storing geometry within a DBMS the question is what internal format the DBMS should use to order the binary data used to store that geometry. There are two approaches.

 

One approach is used in DBMS products sold as "spatial" DBMS. A spatial DBMS will have a pre-defined way of organizing binary data to represent geometry, and this pre-defined way of organizing binary data is built into the DBMS in the form of a data type, such as SDO_GEOMETRY in Oracle Spatial or ST_GEOMETRY in IBM's DB2 Spatial Extender. Because this data type is built into the DBMS it is called a native geometry type. The data is still binary data, of course, but it has been organized in accordance with a format expected for geometry data by the DBMS. When a DBMS product has its own, native geometry type it also usually supports that data type with additional infrastructure, such as the automatic creation of spatial indices or the provision of DBMS server commands that understand that data type.

 

The other approach is used with DBMS products that do not specify a pre-defined way of organizing binary data to represent geometry but which allow applications to utilize a generic binary data type. Almost all modern DBMS packages provide a generic binary data type that can be used to store binary data unstructured by the DBMS. Often referred to as a blob, such generic binary storage can be employed by applications as they see fit. When such generic binary storage is used to store geometry in a form not built into the DBMS it is called a non-native geometry type. Although the use of non-native geometry types allows storing geometry within general-purpose DBMS products without requiring a special "spatial" form of a DBMS, it does require a GIS application that supports the geometry formats to be used.

 

Manifold can use either native or non-native types of storage when available within a DBMS and can even use both types simultaneously. It is important to note that in either case the data is stored in binary form using some organizing format to represent geometry in that binary data. The difference is that in the one case the DBMS has its own special format for organizing that binary data and in the other case it is the GIS interpreting that binary data. Both cases can be equally fast if the non-native type is an efficient geometry format such as Manifold GEOMETRY, since in both cases the speed is a function of the DBMS's ability to work with binary data, perhaps as assisted with specialized indices.

 

The main advantages of using a native geometry type within a spatial DBMS is first, that doing so provides interoperability with any application that uses the native type, and second, using a native type automatically takes advantage of the infrastructure within the spatial DBMS that supports that native data type. The disadvantage is that a native geometry type might not be available within the DBMS package we desire to use, or that it may involve extra cost.

 

The main advantages of non-native binary storage are first, Manifold makes use of spatial DBMS functionality utilizing non-native types within virtually every DBMS and second, using a choice of non-native geometry types can open the door to greater flexibility than is possible by committing to a single data type within a single DBMS vendor. A possible disadvantage is that choosing from a variety of geometry types can make interoperability with other applications more difficult. However, if a generic geometry type that is well-understood and accepted by many applications, such as WKB, is used, then interoperability might well be preserved.

 

Spatial DBMS

 

A spatial DBMS can make it especially convenient to leverage the power of the DBMS by using native geometry types within the DBMS and by extending the internal logic of the DBMS to handle spatial operations of interest for GIS applications. When a DBMS offers a native geometry type together with supporting capabilities (such as spatial indices) it is referred to as a spatial DBMS.

 

GIS has become such big business that major DBMS vendors have extended their DBMS products with native geometry types as well as with supporting capabilities. Locator capability within regular Oracle DBMS editions as well as Oracle's dedicated spatial product, Oracle Spatial, provide Oracle's SDO_GEOMETRY type as well as Oracle GeoRasters and powerful spatial operations. Other DBMS packages such as IBM DB2 have spatial extenders and open source DBMS packages such as PostgreSQL now include "spatial" capabilities as well.

 

Microsoft SQL Server allows third parties to extend the DBMS product to provide a native geometry type and spatial operators, as several third parties have done. In addition, Microsoft's new SQL Server 2008 includes Microsoft-engineered geometry types, GEOMETRY and GEOGRAPHY, together with supporting spatial capabilities.

 

The availability of a spatial DBMS with a native geometry type together with built-in spatial operators confers enormous benefits to a GIS package that can integrate well with that spatial DBMS. Capacity and scalability can increase to whatever the DBMS can handle, terabytes in the case of modern DBMS server clusters. Even if a GIS cannot manage terabytes at one time, by using the spatial capabilities of the DBMS to extract data in manageable area of interest (AOI) subsets, a GIS user can work effectively with that part of the data of interest taken from a data set that is much larger than the GIS client alone could handle.

 

When geometry is stored in a spatial DBMS multiuser and transaction issues need no longer be limited by the GIS but instead can be handled by the formidable apparatus of the DBMS while the data itself can be protected by the transactional integrity of the DBMS. DBMS servers provide transactions, triggers, views and many other capabilities not often found within a GIS.

 

Because mainstream DBMS packages are used by organizations for many more applications than GIS, storing data within the DBMS package's own native data types tends to assure interoperability with a far greater number of applications than can work with any one GIS vendor's proprietary formats, potentially including a greater number of GIS packages as well. GIS vendors may not be able to agree among themselves to open their formats to each other, but every GIS vendor knows they are not in the game if they do not support Oracle's SDO_GEOMETRY or Microsoft's new GEOMETRY type in SQL Server 2008 spatial.

 

Spatial Indices

 

The spatial logic in a spatial DBMS normally consists of enabling infrastructure in the form of spatial indices maintained by the DBMS together with a collection of spatial operators that can be executed server-side by the DBMS server. A spatial index is a specialized form of DBMS index maintained for the geometry type that makes it possible to quickly identify objects based upon spatial characteristics, such as the location and extent of a particular object. Spatial operators are normally functions that ascertain spatial relationships, such as finding all objects that are contained within a given object.

 

When Manifold adds spatial capability to an ordinary DBMS, spatial indices will be created and used. There are important requirements and limitations to spatial indices in such cases that are covered in the Manifold Spatial DBMS Facilities topic.

 

Adding Spatial DBMS Capability to "Ordinary" DBMS

 

Manifold can confer spatial DBMS functionality to almost any DBMS package even if the DBMS does not itself have "spatial" features such as a native geometry type or spatial indices. Manifold does this by providing Manifold-sponsored spatial infrastructure within the DBMS to support usage of non-native geometry as is done with native geometry types within a spatial DBMS. Manifold utilizes geometry types desired by the user and creates spatial indices which the DBMS then operates.

 

Note that there is nothing about a "spatial" DBMS's native geometry type which makes it any better or worse as a binary format for storing data than, say, Manifold's own GEOMETRY type. Embed within a DBMS a spatial index tailored for GEOMETRY and we have almost all of what people really use within a spatial DBMS. In fact, Manifold GEOMETRY is actually significantly more efficient and powerful than the native geometry types used by either DB2 or PostgreSQL. It is just that those DBMS have some facilities that assume their native geometry type is in a particular format.

 

When Manifold provides spatial infrastructure, Manifold creates a spatial index in the target DBMS for Manifold geometry data stored in the DBMS. In such cases, Manifold stores geometry data as a binary blob, but it does so in a way the DBMS would do if it had its own geometry type, that is, with a spatial index maintained to allow sophisticated and powerful manipulation of that geometry data. The spatial index established by Manifold is maintained by the DBMS server using the power of the DBMS package and will automatically be updated if a linked drawing is edited. This confers upon ordinary DBMS packages a true "spatial" capability maintained together by the DBMS and by Manifold.

 

Manifold requires no add-ins to be installed in order to provide the above spatial capabilities to virtually any DBMS. However, because SQL Server 2005 is so popular in the Manifold user community and because SQL Server 2005 allows add-ins to extend functionality within SQL Server 2005, Manifold provides the free Manifold Spatial Extender for SQL Server 2005, an add-in that enables Manifold-managed spatial DBMS capability within SQL Server 2005 to operate with especially high performance.

 

Some people may say that a DBMS cannot be considered a true "spatial" DBMS unless in addition to having a geometry type plus supporting spatial infrastructure such as spatial indices it also has the ability to do server-side spatial operators, for example, to have operators such as a CONTAINS within the DBMS package's own SQL. Server-side operators are very important when the spatial DBMS is used as the "black box" infrastructure behind some application, such as a web site where data from the database is fetched using SQL queries or other code. However such server-side spatial operators are not usually employed when a rich GIS package is used as a "front end" to the spatial DBMS.

 

Server-side spatial operators are almost never used in an environment where the interaction with the spatial DBMS occurs through a very rich client like Manifold. The reason is that most people prefer a point-and-click user interface for spatial tasks and find it difficult to work within a black-box, command-line (that is, exclusively SQL) environment as often ends up being the only useful way to work with server-side spatial components. As a practical matter, when a GIS package like Manifold provides the user interface perhaps the only server-side operation executed by most users is an Area of Interest (AOI) specification to grab a manageable subset of some very large drawing. However, AOI does not require server-side spatial operators - it requires only a spatial index as conferred by Manifold.

 

Therefore, the use of a Manifold-declared spatial index together with straightforward Manifold GEOMETRY storage within an "ordinary" DBMS such as MySQL or Microsoft SQL Server 2005 (as opposed to built-in spatial support in SQL Server 2008) can provide the benefits sought by users through "spatial" DBMS. It is a way for GIS users to get outstanding price/performance by using ordinary DBMS packages, even possibly free packages, as true spatial DBMS servers storing data of effectively limitless capacity, storing drawings, images and surfaces, and to leveraging the power of increasingly multiprocessor systems and increasingly immense RAM.

 

In fact, by using the spatial DBMS capability conferred by Manifold onto a regular DBMS we can usually gain more "spatial" capabilities than are provided by some spatial DBMS packages. For example, Manifold conferring spatial capability onto an ordinary DBMS allows us to store images and surfaces as well as drawings while most spatial DBMS packages can only store drawings. See the Spatial DBMS Facilities topic for information on storing images and surfaces.

 

Notes

 

Users sometimes ask why Manifold supports so many different ways of doing spatial DBMS, at times even different ways using the very same DBMS product. The answer is that the Manifold user community includes a very large number of different organizations and individuals who have very diverse interests and the company likes to provide as many options as possible to suit different tastes.

 

Although it is true that providing "generic" spatial DBMS capabilities which will also work within a native spatial DBMS such as Oracle may seem redundant, the internal architecture of Manifold is so modular that there is not a great engineering cost for making all forms of spatial DBMS storage available regardless of what data store is employed. This provides flexibility and freedom of choice which helps users get the maximum possible value out of their technology investments.

 

For example, although using native Oracle Spatial capabilities is clearly the first choice for GIS professionals working with Oracle, it could be that to take advantage of some third party utility or application that does not understand native Oracle Spatial but does understand, say, OGC WKB we would like to make some data available in our Oracle data store in OGC WKB form but still have a generic spatial index in play for good performance. Manifold makes that possible and easy.

 

Although all Manifold editions can work with attribute data and can store geometry using blobs, only Enterprise Edition and above can take advantage of spatial DBMS to store geometry using a spatial DBMS server's native geometry type.

 

This topic has discussed two forms of data, geometry and attribute data, in connection with drawings. However, there is also a third class of data, raster data for images and surfaces, that is also often stored within a spatial DBMS. All spatial DBMS packages enable storage of geometry for drawings, but not all enable storage of raster data for images and surfaces.

 

Manifold can work with raster data in some spatial DBMS packages, for example, Oracle Spatial. However, even within a particular DBMS product line raster capability may be an option not found in all versions of that DBMS. For example, the free Oracle Express package has Locator capability in it to allow spatial DBMS work with drawings, but Oracle Express does not include Oracle's GeoRaster type to enable work with images. One must step up to the full Oracle Spatial package to get GeoRaster capability.

 

When using Manifold's generic spatial DBMS capabilities, nearly any DBMS can be used to store images and surfaces. Excellent!

 

See Also

 

Spatial DBMS Facilities - A key topic for anyone working with spatial DBMS.

 

Data Storage Strategies

Database Installations

Drawing - Area of Interest

ESRI Geodatabases

Example: Storing a Drawing in Manifold Spatial DBMS

Example: Storing an Image in Manifold Spatial DBMS

Example: Storing a Surface in Manifold Spatial DBMS

Geometry in Tables

Manifold Spatial DBMS Facilities

Manifold Spatial Extender for SQL Server

Linked Drawings

Oracle Spatial Facilities

Project Pane - Open Data Source

Queries and Geoms

Spatial Extensions

SQL Server Spatial DBMS Facilities