Data Storage Strategies

How we decide to store our data says a lot about how we can use that data. Data in the case of Manifold consists of drawings, images, surfaces, tables and the like. Manifold provides such a rich spectrum of capabilities for storing and working with GIS data that sometimes beginners are confused by the many options possible.


The usual approach in the GIS industry (or, for that matter, in most other software industries) is to have two classes of storage, which may be loosely categorized as:


·      Local, Desktop storage - Applications store data in individual files, just like Microsoft Word saves documents in .doc files or Excel saves spreadsheets in .xls files. This is simple and easy for individuals but not a good model for enterprises or multiuser applications.

·      Linked, Server storage - Applications store data in centralized DBMS servers, to which users running client software can connect to link to whatever data they need. Enterprise-class database systems are usually used as the centralized server. This provides administrative centralization and support for multi-user work, but at the cost of added complexity.


Manifold can store data using either of the above models, and in addition introduces a third model, unique to Manifold, that splits the difference between the above two models:


·      Shared, Enterprise Edition - This is a simplified Enterprise server shared storage model designed for use by small and mid-sized organizations which desire the simplicity of desktop storage while having some of the organizational advantages of server storage. Data is stored as components within the Enterprise server, like documents within a file cabinet, which can then be shared in their entirety into a particular project.


The result is three levels of storage supported by Manifold, which may be freely intermingled with each other. These three levels of storage may be summarized as:


·      Local components stored in .map project files - This is the classic document-oriented desktop storage model: Users work with individual .map project files that contain the drawings, images, tables and other components they are working with. This is a typical approach for individuals or very small organizations, the classic "one file, one user" approach employed by Microsoft Office applications. Data is stored in the form of .map project files, which are the "documents," and the "file cabinet" used to store those documents is simply the Windows file system, using folders and other Windows facilities to keep things organized. There is no real multi-user activity done with this system. If someone else needs a drawing we've created, we give them a copy of the .map file containing the drawing. We hope that if they make any changes we will be able to reconcile any such changes with whatever other copies of that .map file we have on hand. Any Manifold version can use this storage model.


·      Linked components stored in database servers -This is the classic, object-oriented server storage model. Drawings, images and tables are stored in a general-purpose way within a database server. Users can link such items into their local projects. This is the general-purpose, "one server, many users" approach used by large enterprises. It is called object oriented because the technology of linking drawings from general purpose geometry storage in database servers allows the entire drawing to be freely edited, down to creating, editing and deleting individual objects, that is, individual points, lines and areas. The "file cabinet" is the database server, and users can link entire components (drawings, images, tables, queries or surfaces) into their projects or, in the case of drawings if certain spatially-enabled DBMS servers are used, only that part of the drawing that is of interest. Concurrent, multi-user editing of drawings is fully supported. Enterprise Edition is required for full capabilities with this model.


·      Shared components stored in Enterprise servers - This is a uniquely Manifold Enterprise server storage model that combines benefits of desktop and server storage models. Drawings, images, tables and other components are all kept in a centralized Enterprise server, which is simply a database configured in a Manifold-specific way to store all Manifold components. Manifold users running Enterprise Edition can link items from the Enterprise server into their project. This is a simplified "one server, many users" approach that provides component-oriented, server storage for small or mid-sized organizations. Individual users still can keep their projects saved as .map project file "documents," but the content of those documents is now fetched as whole components that are linked in from the Enterprise server "file cabinet." Multi-user editing does not occur simultaneously, but a shared component can be checked out by a user, edited, and then checked back into the server. Any changes made will be automatically propagated into any project that uses that shared component. Enterprise servers require use of Enterprise Edition on client desktops.


The above summary is just one way of looking at the many capabilities that Manifold provides for storing data. It helps explain why it is that Enterprise Edition includes a dedicated system for working with Enterprise servers even as it also includes very general-purpose features for saving linked drawings in DBMS servers using formats like OGC WKB or Oracle spatial SDO_GEOMETRY.


The advantage of Enterprise server storage is that it provides an easy-to-use way of storing any Manifold component, even scripts and comments and layouts, within a centralized server from which the component can be shared by many users within many projects. Enterprise servers provide simplified administration and ease of use for users. The check out / check in editing model provides easy editing by those who have permission to do so while removing the need for training users in the nuances of concurrent multi-user editing. The disadvantage is that Enterprise server storage is specific to Manifold only (although data can always be exported to other interchange formats) and does not allow concurrent multi-user editing.


Although it is more complex than using Enterprise servers, the general-purpose server storage model has a lot of appeal for interoperability with other programs. If a standard format, such as Oracle, is used, then potentially very many different applications can interact with that same geospatial data in a (hopefully) vendor-neutral way. The disadvantage is that only some types of components can be stored in this way and that greater expertise is required to exploit such servers. The availability of concurrent, multi-user editing is both an advantage and a disadvantage. Along with the obvious benefits comes the greater user expertise required to resolve editing conflicts that may arise from simultaneous editing of objects by different users.


One other potential use of server storage is the use of spatially-enabled DBMS servers to store GIS data. Such servers have the ability to perform some spatial operations, such as fetching all objects in a drawing within a particular area of interest, and therefore can be used to increase performance and capacity by offloading some work from the GIS desktop. See the Spatial DBMS topic for more information about spatially-enabled DBMS servers.


Manifold projects can mix and match the storage models above, and the intermingling of different storage models has been encouraged by advances in storage technology coupled with reductions in cost. In earlier days, for example, an individual user simply could not afford access to, say, Oracle spatial servers. In modern times, Oracle Express Edition can be used for free. Manifold also provides generic spatial DBMS capabilities for almost any DBMS, including free servers such as MySQL, which can now not only store drawings but also images and servers with almost unlimited capacity.


Because the cost of enterprise-class DBMS products has come down to zero in some cases, even individual users can employ server storage if that makes sense for them. Likewise, some aspects of server storage have appeared in the form of web servers that play a role (albeit at usually slower performance) once played exclusively by database servers. Users can freely combine storage models to achieve exactly the result desired.


For example, we can have a project that contains purely local components, links in an image from a web-based image server, shared queries or scripts from a shared Enterprise server and drawings linked in from an Oracle Spatial server. The linked image may be used for background only, so it doesn't matter that it cannot be edited, the scripts or queries may be something a more experienced user in our organization has shared to our department's Enterprise server, and the drawing might be an area of interest linked in from some vast, 200 GB "seamless" drawing maintained on our organization's Oracle cluster. Our job might be making edits to the vast Oracle drawing assisted by the scripts and queries provided by our colleague while the background drawing helps us stay oriented.


Advice to Users


The following guidelines reflect how most users tend to choose Manifold System versions and the storage strategy employed. Feel free to modify these guidelines for your own needs, as most users do.


·      Individual users with relatively small and mid-sized projects usually keep everything in the same .map file, especially while learning Manifold. The main exception is that large images used for backgrounds should be saved in .ecw form and linked into projects. As users get more sophisticated and find themselves with very large data holdings, they will be careful to keep their Windows systems well organized.

·      More advanced users with larger data holdings will often turn to Enterprise Edition to organize their data in Enterprise Servers. That's an especially useful strategy if a particular drawing is used in very many projects (that could be stored on various machines in one's office) and the drawing is updated from time to time. It's a lot easier to simply check out the drawing, edit it and check it back into the Enterprise server than it is to track down each project that uses that drawing and then to manually update each project by copying and pasting from a reference project somewhere.

·      Most organizations with more than one person doing GIS will use Enterprise Edition, initially to store data within Enterprise Servers to help keep things organized. This is a good solution when simultaneous, multi-user editing of drawings is not required.

·      When simultaneous, multi-user editing of drawings is required, organizations will use Enterprise Edition and will save their drawings in a database such as Oracle, SQL Server, IBM DB2, PostgreSQL, MySQL or some other convenient DBMS.

·      Very large organizations tend to centralize their operations around storage in databases using either Oracle spatial technology for drawings or WKB geometry storage in SQL Server to allow interoperability with the many applications interacting with GIS data in such organizations.

·      Organizations working with many users and large images will almost always store the images in a spatial DBMS, either using Manifold's generic spatial DBMS capability with whatever DBMS is in use or choosing a spatial DBMS that includes raster support, such as Oracle Spatial with GeoRaster.

·      On occasion, individual users with large images or image libraries will also store them in a spatial DBMS, either using Manifold's generic spatial DBMS capabilities or using the free download of Oracle Enterprise Edition to get GeoRaster storage capability if their uses fit within the permitted scope of the free download Oracle license.

·      At any time, of course, if an application requires that a linked drawing be created from data stored outside the project (be it a personal database of some kind in Access .mdb format or be it a huge corporate database leveraged by a big IMS application), then we will use linked drawings or linked tables as necessary to bring data into our project from other data sources.

·      Note that at any time we can always convert a linked or shared component into a local component by unlinking it or unsharing it. Doing so makes a snapshot of the data supplied by the data source as of that instant. This is a handy way of capturing entirely within the project's .map file the current content of a project that includes linked components, so that we can send that snapshot to a colleague who might not have the ability to link to the same data sources.


See Also


Database Installations

Enterprise Edition

IBM DB2 Express-C Edition

Oracle Spatial Facilities

Oracle Express Edition

Spatial DBMS

SQL Server Express Edition