Subscribe to this thread
Home - General / All posts - Very slow data load time... Am I missing something?
kobylarz4 post(s)
#29-Mar-23 15:42

Hello,

Apologies if this is a common topic but I couldn't find anything on the forums. I purchased Manifold because I read that it has better performance than ArcGIS Pro while processing large imagery files. I am working with several large imagery rasters that need to be merged and clipped (county wide NAIP 2022 mosaics, ~25gb total). I thought Manifold might save me a few hours on this task based on what I read about it.

I attempted to load two of these rasters into Manifold to test, and they began to load very slowly. I read the documentation and it said to be patient, so I walked away and came back an hour later and the dialogue said that approximately 100,000 units/cells had loaded... of about 6.5 million. After doing some quick math, I concluded that waiting 70 hours for my files to even load into the workspace wasn't going to work.

Am I missing something here? Is this performance to be expected? My machine specs are 32gb RAM, Ryzen 5800x, M.2 drive so I don't think it is a hardware resource issue. I'd appreciate any pointers or advice about how I should be working with these large files to get the advertised performance for working with large rasters.

Thanks!

Dimitri


7,400 post(s)
#29-Mar-23 18:54

If they are NAIP files in .TIF format, 25 GB of TIF files could expand into 200+ GB of real images. You don't mention whether you linked the images into your project or imported them. That matters. See the discussion in the Importing and Linking topic. TIFF files aren't a parallel format, but Manifold .map project files are a parallel database, so they allow far faster performance.

Once you import the images and save them in Manifold .map format, then 9 is, indeed, very much faster than Pro at processing images. For example, 9 will open a 200 GB .map file in 1/2 second, and will open a 200 GB image in that file in 1/2 second, and it will do almost anything with that image by way of processing far faster than Pro. It will also open a 100 GB .map file full of hundreds of vector layers in 1/2 second, while you'd be waiting a very long time for Pro to open a similar sized project from a GDB.

Manifold users who work with bigger data tend to store their data in .map projects for that reason, because that's so much faster than almost anything else. If they have to load in some data, like a big PBF from OSM or a collection of rasters, they'll think ahead and do that overnight or whatever, so they don't notice the one-time cost in time to get their data into very fast Manifold storage.

It matters, also, of course, the basic speed and such of the computer you're using. See the Performance Tips topic.

kobylarz4 post(s)
#29-Mar-23 20:43

Hi Dimitri,

Thanks for the response. The source files were actually .sid format so yes, they expand dramatically when converted to .tif. I initially tried linking the data but was running into slow performance when trying to merge or join so then I tried to import the files. This is when I ran into the slow performance, during the import phase. I understand that it is possible to plan ahead but like I said, it seems like it was going to take upwards of 70 hours to complete that import, at least based on the progress of the process after an hour or two. I was hoping I was missing a step that might speed up the import process.

Dimitri


7,400 post(s)
#30-Mar-23 05:21

No, for any given computer configuration there's no way around that, but at least it's a one-time cost. If you have a fast computer with plenty of RAM, fast cores, and fast SSD it will go faster, of course, than with a slow machine with small RAM and slow disk, but it's still a one-time cost.

.sid format is particularly slow because it is designed to be fast as a lossy viewing format, not as a format for precision analysis. For analytics, the first thing that has to happen is extracting what data there is in the .sid into a form where analytics can be done on it.

Ultimately, no matter how fast the analytic engine might be, it's not going to be able to go any faster than the storage technology used for data access. That's yet another example of Gene Amdahl's famous observation that a parallel process can't go any faster than the slowest, serial link.

To cobble up an extreme analogy to help illustrate the point, it's like trying to do audio editing to create a new mix by storing your source audio on cassette tape or a reel-to-reel tape recorder as a working medium: very slow, because to get to every little bit you have to run the tape backwards and forwards. But if you can load all the source audio tracks into memory in a modern digital workstation, well, then you can cut and paste bits and pieces of tracks instantly however you like.

To carry that analogy into the job you're doing now, you have to "play the tape" of the .sid file all the way through just once, so that its contents can be captured into fast, modern format. Thereafter you can work with it much quicker.

I should also add, in the "throw hardware at it" department, that I'm not a big fan of throwing faster hardware at a task in a brute force way, but I have to admit to being surprised sometimes how fast that can be. The workstations I normally use are pretty old, mainly because I'm too lazy to migrate all my files and such to newer machines. Manifold normally is so fast on those that I don't feel a need to improve hardware. But occasionally when working with non-parallel software I'll RDP into a much faster machine to do a job and I often get a "whoa!" feeling at seeing how a state of the art, really fast machine with lots of RAM and fast SSD can do something significantly faster than the older machines I normally use.

I'm also getting better at not being so totally slovenly at how I use Manifold. I use 9 as a personal organizer and information manager, and I also keep "master projects" on hand that within them have a table of projects that I frequently open for a particular theme. Just one click launches the project in a Manifold session for me, without having to worry about remembering where the project was saved or what portable edition of Manifold I'm working with. For example, there's one for Travel where I have a list of Manifold projects related to travel to various places. I like archaeology so when I travel to Istanbul I get ready for the trip by launching my project for Turkey, planning walking routes in my free time using maps showing archaeology, where the hotel is, the stop to catch a ride to the airport, and stuff like that. There are tables/drawings with favorite hotels, restaurants, contacts, etc. Anyway, as a result of actively using Manifold right now I have 14 sessions open on my task bar.

But that's not so smart if the memory cache size set in Tools - Options for each is a bigger number, like 16 GB, because those sessions can all start competing with each other to grab and to use RAM, and if you launch a big job in one you could end up with lower performance as a result of memory thrashing between that and other sessions or other Windows programs trying to grab big chunks of RAM. It's also not smart if the cache is set really low but I want to do a bigger job.

I've suggested a solution to that, which would be a command line option or a launch option for Manifold that allows setting a cache size for each session. You could leave the default at a small number, so that simple projects with a few hundred notes used as personal information managers don't try to grab 16 GB, but for big tasks you could tell it to go ahead and grab 32 or 48 or whatever GB.

Of course, if you don't have a dozen big sessions running at the same time the above is a "don't care," but I like the idea of having many things going, so when I need something, like a password manager, it's right there at my fingertips.

kobylarz4 post(s)
#30-Mar-23 15:16

Thanks for that more detailed explanation about how Manifold processes data. I think I got a little ahead of myself thinking it would be a quick solution for the task at hand. I can see how it would be an excellent tool for working with these kind of files repeatedly. I will keep it in mind for future raster processing work. Thanks again!

rk
618 post(s)
#30-Mar-23 10:36

Would it be fair to say that Manifold is not the right tool, if you only need to merge and clip some images once and then use the result somewhere other than Manifold? Maybe there are tools that take source image(s) and the clipping region(s) in advance, and can "fast forward" over the parts of images that won't make it into the result.

Importing into Manifold means that every part of every image is read. We cannot tell Manifold to ignore parts of images in advance.

But if you need to clip the same set of images often and you might need any part of the images in the future, then it makes sense to first import everything into fast Manifold format and clip away.

kobylarz4 post(s)
#30-Mar-23 15:18

Thanks for the response. I came to a similar conclusion and found a work around using ArcGIS Pro that got the job done in a reasonable amount of time.

I'll keep Manifold in mind for future raster processing tasks that match the second scenario you listed above.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.