Subscribe to this thread
Home - General / All posts - rasters in a database?
artlembo


3,400 post(s)
#24-May-23 00:19

I saw this blog post, and thought it would be interesting to hear how people are using raster data with Manifold. Are you importing the rasters in, or just linking them. I know that importing rasters really inflate the .map file.

Also wondering if the round trip is worth the time savings: if Manifold can perform a raster operation in 2 minutes vs. 10 minutes in QGIS, but takes 5 minutes to import and then export later, is it worth it? What kind of time differences are you seeing when considering the import/export and transform of raster data? Curious to hear what people are actually working on.

Dimitri


7,413 post(s)
#24-May-23 09:15

I saw this blog post,

That's a blog from the creator of PostgreSQL, so you have to take what he writes understanding that he's writing from a very PostgreSQL-centric perspective. For example, this:

Where the database is replacing a collection of GeoTIFF imagery files, it is probably false. Raster in the database will be slower, will take up more space, and be very annoying to manage.

It would be more accurate to say "Raster in the database will be slower, if you are using PostgreSQL." PostgreSQL is notoriously slow with rasters. If you're using a database, like Manifold, which is fast with rasters, you don't have those downsides. Rasters in Manifold tend to be faster, they take up only slightly more space (and sometimes less), and they're not at all annoying to manage. In fact, they're much easier to manage than in most formats.

I always import rasters into the .map file, and I routinely work with rasters in the 20 GB to 100 GB range. I don't find that they inflate the .map file all that much compared to original formats like GeoTIFF. In exchange you get huge flexibility and way more capability.

For example, suppose I have a 100 GB GeoTIFF that shows terrain elevation data for a continent in reasonably high resolution. I can create images with 20 different stylings of that data for zero extra size (they all take their data from the same table) to show different combinations of palettes and hill shading options. Each one of those different stylings can be used as a layer.

Also wondering if the round tripis worth the time savings: if Manifold can perform a raster operation in 2 minutes vs. 10 minutes in QGIS,

Round trip time to import and export is only something to consider if you're using Manifold as an editor for data that you're forced to keep in some other format, and at that it's the tail end of what could be a very long pipeline of workflow. I keep my data in Manifold .map format because nothing else comes close to the speed and convenience of the format: pop open a 200 GB project in 1/2 second, and all that. You can also link in hundreds of GB from other projects and it's just as fast as having it all in the same project, even if your constellation of data is over a terabyte.

It also depends what you're doing with the rasters. If your analytics involve any of hundreds of operations that can use GPU effectively, you're typically looking not at 2 minutes in Manifold vs. 10 minutes in Q, you're looking at 5 or 10 seconds in Manifold vs. 10 minutes in Q. That's a big win even if you're stuck with eventually writing the data out to some ancient, slow format.

It's true that keeping everything in Manifold format means a one-time cost of time to import really huge files into Manifold. But that's true of every high performance environment. If you're doing HADOOP or other parallel work in the cloud, you have to upload your data into that cloud before you can do anything.

But once your data is in Manifold, everything becomes so convenient and fast. I don't even bother checking data sizes anymore, and I routinely use 10 GB and 20 GB raster layers just because I have them on hand and I like the way they look in my projects.

artlembo


3,400 post(s)
#24-May-23 14:28

I alwaysimport rasters into the .map file, and I routinely work with rasters in the 20 GB to 100 GB range. I don't find that they inflate the .map file all that much compared to original formats like GeoTIFF.

I haven't found that. In fact, in the last 10 minutes, I imported 60MB of USGS DEM data for Tompkins County. When I saved the individual DEM images inside of Manifold, it was 150MB. That's over double.

I merged the DEMs together into a seamless image that was 42,000 x 38,000 pixels and saved the .map file. The .map file was now 16GB!!! I was stunned. So, I did it again, just to make sure I hadn't done something wrong. When I saw the same size, I decided to do it a third time but removed the original DEM data from the file (those 9 or so DEM layers). The result was the same, about 16GB of data.

Of course, as you say, the response time was instantaneous. But, going form 60MB to 16GB is a huge explosion of data size. I haven't even thought about what a 1m resolution image would inflate to. This of course is just fine for certain applications. But, there may be other instances where having the data in a much smaller file linked in would be sufficient.

hugh
200 post(s)
#24-May-23 16:53

but disk space is cheap, and in some cases like Dimitri notes the .map file is not so much larger than the TIFF. This is a smaller DEM just imported, but I routinely use larger ones like Dimitri's. It may be that the size expansion you see is from importing a more compressed FLT?

16 GB DEM tiff

import time 40 seconds

save file 91 sec, .MAP file 12 GB

one ft contours 113 sec

save 57 seconds

export to FLT 95 sec FLT size 8.5 GB

Dimitri


7,413 post(s)
#24-May-23 17:08

As hugh comments, it depends on what formats you're using and what compression is being used. 60MB to 150MB isn't a big deal in my book if it gets you instantaneous speed and far more flexibility and power.

The merged image size is puzzling, but it's clear from the basic arithmetic that something is going on: 42,000 x 38,000 = 1,596,000,000 pixels. At only 1 byte per pixel that's 1.6 GB. If the DEMS contain float64 values and only one channel, that's 8 bytes per pixel for a total of almost 12.8 GB. So just from the pixel size the merged image that was created is very much bigger in fact than just the sum of the DEMs. That could be a projection choice or it could be something else.

Keep in mind that if you create a merged image that is composed of a bunch of different DEMS those DEMS could be arranged in a pattern with lots of blank space. For example, suppose you have 20 DEM images that spatially are arranged in a diagonal line: the merged image you create is the rectangular bounding box that holds all of the DEMS with lots of blank space. The merged image is very much larger than just the sum of the 20 DEMS in a line from one corner of the merged image to the opposite corner.

Without seeing the data, the original formats, etc. it's not really possible to guess what's going on in the specific case you mention. I've only seen very big expansions like you mention when working with ECW or SID data. I usually use GeoTIFF (because that has better odds of having the projection described accurately, as compared to the Rube Goldberg lottery one encounters with "world files" and such) and I haven't seen such big expansion with those.

But if you want to do analytics on data as opposed to just viewing it, sooner or later highly compressed data has to be decompressed into real, full sized images, whether that's done explicitly or behind the scenes.

hugh
200 post(s)
#25-May-23 17:02

However, IMHO, the kind of problem Art raises is very important given the huge scale of data GIS is increasingly going to have to deal with. Coincidently, this geoinformatics solicitation concerning the problem came my way last night: https://www.nsf.gov/pubs/2023/nsf23594/nsf23594.pdf

Maybe you should get your team to go for it, Art.

artlembo


3,400 post(s)
#25-May-23 23:46

Ha, ha. Not with a 10 foot pole!! Administering an NSF Grant is almost as much work as doing the grant itself :-)

hugh
200 post(s)
#26-May-23 01:17

You are so right and at 82 years old I am not volunteering to help. I mentioned it just because I think there is a major big data challenge facing GIS and Manifold will be at the front of meeting it.

adamw


10,447 post(s)
#27-May-23 09:47

I haven't found that. In fact, in the last 10 minutes, I imported 60MB of USGS DEM data for Tompkins County. When I saved the individual DEM images inside of Manifold, it was 150MB. That's over double. I merged the DEMs together into a seamless image that was 42,000 x 38,000 pixels and saved the .map file. The .map file was now 16GB!!!

Going from 60 MB of DEMs to 150 MB MAP is fine (there's overhead, yes), but going from there to 42,000 x 38,000 pixels and taking 16 GB is certainly not normal. My guess is that the DEMs had different resolution and / or projection and the size of the pixel chosen for the merged image was far too small. Eg:

...there were multiple images, some with big pixel size, others with small pixel sizes, the chosen pixel size of the merged image (covering the area encircled in blue) was taken to be small, this then resulted in the explosion of data (imagine the entire encircled area being split into pixels as small as in the right image, this will produce a lot of data even with obvious optimizations for storing empty tiles - which we, of course, have).

Attachments:
merge-image-catch.png

artlembo


3,400 post(s)
#27-May-23 10:17

I will double check the parameters and report back.

mdsumner


4,260 post(s)
#25-May-23 03:50

It would be more accurate to say "Raster in the database will be slower, if you are using PostgreSQL."

classic, gold

but, would be nice to see Manifold be able to stream from online GeoTIFF, it's equivalent to a tile server which you definitely don't import in toto


https://github.com/mdsumner

adamw


10,447 post(s)
#27-May-23 09:28

If you are talking about cloud-optimized TIFF files, we both read and write them (when you export something as a TIFF, that TIFF is cloud-optimized by default). We do not stream partial data though. This could be added via a dataport: take a URL for a TIFF, make a virtual image, read data from the pyramids / tiles and only read enough for the current view, communicating over HTTP. This wouldn't be super-optimal though. It would be hard to expect HTTP range requests to be cached, for example - if the server could simulate multiple images instead of one, caching would work much better because intermediate layers don't need to be as sophisticated. Or, alternatively, if you import or link a TIFF into a MAP, and then serve the MAP via a Manifold Server (TCP) instead of serving the TIFF via a general-purpose HTTP server, that would work much better as well.

mdsumner


4,260 post(s)
#28-May-23 12:34

thanks Adam, I understand that a COG is just a geotiff with good internals :) - but I've become so accustomed to partial read from /vsicurl remotely it's hard not to want it (and just expect it) for Manifold. That said, I don't use *arbitrary* read, usually I am just asking for a given extent and dimension and crs, and so the caching *is trivial* because I have specified the actual target result and just wait for it to populate (I'm less interested in adapting to the many zoom levels of a remote COG and more in getting what I need - just enough pixels - at any of a wide range of scales for anywhere in the extent of the COG - or tile server).

I think what I'm wanting implies some new UI, so I'll think about what it might be like - but certainly with a GDAL data port + VRT I can do what I need now.


https://github.com/mdsumner

Mike Pelletier

2,122 post(s)
#25-May-23 14:34

Art, I'm not the best to answer how people are using rasters in Manifold because I really haven't done much yet with 9. The reason is a combination of other important stuff to work on and waiting for more features. I have imported and merged 88 GB of lidar bare earth DEM in 9. I link to it for display in other projects and occasionally make contours. It's been fast and trouble free. Works much better than linking to the raw data, especially given it is contained in many hundreds of files. Not sure if there's been bloating of file size with this data but it hasn't been a concern.

Looking forward to more lidar tools and especially image blending tools for making beautiful land use images. However, my preference is a focus on labels and styling now that web mapping is close. Also, user interface and vector editing.

dale

630 post(s)
#26-May-23 01:38

Mike, I'm the same.

Import a folder full of tiles, merge. Then process for slope, watershed etc. I'm routinely working on datasets in the order of tens to low hundreds GB.

No concern with file size, other than noting the constraints Dimitri mentions. Coastal strips running SW - NE are going to end up as large files, due to the null data areas.

In my case, I've been using lidar bare earth data to locate old mineshafts, other evidence of ground disturbance, and in the process found archeological features. 9 permits rapid filtering of what was once considered impossibly large datasets. I've been using mean slope to measure all sorts of anthropogenic disturbance, in what was considered undisturbed country.

At landscape scales, I can evaluate data for karst features, where I once had to surface search for the same. Age and agility and a desire not to break bones in very rough ground mean features can be identified first. The combination of freely available data in my part of the world and software like 9 has meant huge steps in both discovery, and in documenting features once thought not to exist.

And yes, image blending tools, please, if only to make web mapping output more beautiful.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.