Subscribe to this thread
Home - General / All posts - Strategy for importing/merging large amounts of images into 9
Mike Pelletier

2,011 post(s)
#13-Jun-22 15:48

When importing/merging large amounts of images into 9 it takes lots of time. There's good reason for this, but it deserves a good strategy. I got a 16 TB drive to allow plenty of space and put my Windows TEMP/TMP folders also on that drive. Problem is that I tried importing a folder of 1.4 TB of images and about 95% of them came in with no data (seemingly random location for images that succeeded). I suppose the system ran out of resources somewhere. Also, have a 4TB SSD with Windows on it.

So now I'm importing chunks of data and it is importing fine. Trouble is that it requires lots of manual steps for each chunk to import, merge, delete initial data (or copy merged image to a different project), and then save. Each requiring long delays.

It would be great to have a script for this. With a script, perhaps it would be even better to do smaller chunks of data that will fit on a SSD. This would allow at least the merge and removable of the initial data to be done on the SSD.

Am I approaching this correctly? Ultimately, I want to export the merged into to an ECW. Looking for ideas before submitting a suggestion. Thanks.

Attachments:
Capture.PNG

adamw

10,011 post(s)
#14-Jun-22 13:10

Scripting the merge is currently a bad idea in that the system Merge tool does things which are hard to express using SQL, one would have to roll a complex script function to match the speed.

What are the biggest pain points with a multi-step merge? The way I see it, you have to split all images into, say, 4 roughly equal parts (do not pick files randomly, obviously, group them by location so that the intermediate results are as small as possible). Then, for each part: (a) import images, (b) put all images into a map, (c) open the map, launch the Merge tool, make sure the parameters are correct (pixel size!), proceed with the merge, (d) remove all images except the merged one, (e) save the MAP file. Then you will create a new MAP file, link the MAP files with the intermediate results, and merge these results.

The last step is perhaps fast enough, it's the preparation of intermediate results that will take most of the work. For the preparation of intermediate results, steps (c) and (e) are already fast (in terms of human action involved). Step (a) is also fast, you can import multiple files in one go, you don't have to import them one by one. Step (b) is done like this: you create a map, open it, then you filter the Project pane to only show images, select all images, drag and drop them into the opened map. (To avoid a huge hit from rendering all the images, it might make sense to first drop only one image into the map, zoom into it, then drop the rest into the map. Then perhaps turn all images off in the Layers pane.) Step (d) is fast as well, you select everything except the merged image in the Project pane and delete it.

I agree (a)(b) and then (d) could be performed using a script, but it seems to me the UI is reasonably fast as well. As in, you maybe spend a couple of minutes selecting the files to import, then wait for hours until the import completes, then maybe spend another couple of minutes creating a map and starting the merge, then waiting for hours until the merge completes, etc.

artlembo


3,260 post(s)
#14-Jun-22 13:38

you maybe spend a couple of minutes selecting the files to import, then wait for hours until the import completes, then maybe spend another couple of minutes creating a map and starting the merge, then waiting for hours until the merge completes, etc.

While that is likely the best approach for now, it does sort of make you a slave to your computer for multiple days. When I was working on a large project I did some thing like this. Things were incomplete at the end of the day so I had to go home and pick it up the next day. But, as you say there is another really long step. If that could have been run overnight, things would be ready in the morning.

So, someway to string together the multiple steps would be valuable

adamw

10,011 post(s)
#14-Jun-22 13:46

I see. It is possible to create a script that would create a bunch of MAP files for intermediate results, import a portion of source images into each, then save. But I don't see how one can get around waiting between individual merges. Because running multiple merges at once would perhaps be counterproductive, so you have to be running them one by one and currently they have to be ran manually.

We will think about exposing what Merge does as a query function. (Meaning Merge for rasters, because that's where the complexity is, Merge for vectors is trivial in comparison.)

Mike Pelletier

2,011 post(s)
#14-Jun-22 14:19

Each of my steps have been taking 6+ hours, so it gets to be a scheduling hassle and lots of downtime in between steps. As you say the time for doing manual input to get to the next step is not a problem.

What about the notion of breaking into much smaller chunks so that the import/merge/delete inputs/save all occurs on SSD? If that is a good idea, any suggestions on how to dial in how big a chunk one should do based on available space on SSD?

Also, I tried once having a couple manifold sessions running all on the same hard drive. I cancelled it because it seemed to be taking longer than running them in succession. Is that what you mean by "counterproductive"?

adamw

10,011 post(s)
#14-Jun-22 15:31

Yes, by "counterproductive" I meant that trying to run multiple big sessions in parallel is worse than running them sequentially, running them in parallel might easily take more time. (If the sessions knew about each other and coordinated for resources, it could have been different, but such coordination only happens within a single session, not between multiple sessions.)

On using SSD, how big are the images in pixels, what is the pixel type and how many images do you have? If the SSD is big enough to host all images plus the result of the merge together, that's one story -- in that case, put TEMP onto SSD and save MAP files there as well. But if it is not big enough for that, then it is probably best to use SSD for TEMP and keep MAP files on the bigger drive. And for the final merge step you will need to point TEMP to the bigger drive as well.

Mike Pelletier

2,011 post(s)
#14-Jun-22 15:55

The images are 4-channel tifs, about 150 MB each, and there are 28,259 of them. I have about 3.5 TB of free space on my SSD. The SSD holds Windows.

Sounds like I should put TEMP/TMP folders on the SSD, saving all map files to the large hard drive. Then do as you said above for the final merge.

Appreciate the help Adam.

adamw

10,011 post(s)
#15-Jun-22 10:29

One more thing about the ultimately final step of converting to ECW. Do you really need to do that? We are looking at 4.5 TB of image data, say, we achieve 1:100 compression rate, that's going to produce a 45 GB ECW file. Typically when there's so much data which takes so much effort to prepare, that's because it's going to be used by multiple people. But do you really want to be copying such a big file between machines? I'll remind that if you try working with the file via a network share, that's unreliable. Since the file is going to be read-only, it won't get damaged, but clients can easily hang up or crash. In order to serve file data reliably, you will need something like MANIFOLDSRV. And if you are going to be using that, you don't need to convert to ECW, you can just use the MAP file. Just a thought.

Mike Pelletier

2,011 post(s)
#16-Jun-22 13:15

Thanks for checking on the purpose. The big ECW is used by Mfd 8 in a couple web maps and occasionally as a way to share the data for others. I've been using a 46 GB ECW successfully for many years.

Just wanted to check on this. Does the intermediate merges help enough with the final merge to justify the time for each individual merge?

adamw

10,011 post(s)
#16-Jun-22 15:27

If you are importing individual images, intermediate merges help only if there are many overlapping parts (multiple images covering the same pixels). Your data set likely does not have that. So you can avoid intermediate merges and just merge everything. The only thing I would advise is to merge 10% of the data set first, to gauge time / space requirements for the full set (they should be roughly 10x, ideally, maybe allow for slightly more).

If you are linking individual images, that's a different story, there are various limits onto the number of files that can be opened at the same time and they can be pretty harsh depending on how the files are opened (which libraries are used), can be as low as 1000. So, if you are linking individual images, absolutely do that in portions.

Mike Pelletier

2,011 post(s)
#16-Jun-22 16:55

Okay, good. So it sounds like in general the best strategy would be some automated means to import a large image dataset into a bunch of .map files and use these to do a merge. Ideally use an SSD for the import, sizing each .map file based on available space on the SSD, even if this greatly increases the number of .map files produced. Save these .map files to a large spinning disk as they are created for eventual merging to a new image within a .map file.

Are there speed benefits of having one large spinning disk hold all the .map files with imported images, the new .map file with the merge, and temp files for the merge vs having multiple drives that hold a portion? In my case, I have a 16TB drive that should hopefully be able to do just that for my 4.5 TB of data.

adamw

10,011 post(s)
#17-Jun-22 07:55

Given three scenarios: (1) TEMP and MAP on SSD, (2) TEMP on SSD, MAP on HDD, (3) TEMP and MAP on HDD, the best is likely (1). But with your amount of data, SSD might not be big enough, so go with (2), it should be very competitive with (1). For your last step, SSD might not be big enough to hold just the TEMP either, so you will have to go with (3), which is slowest.

Mike Pelletier

2,011 post(s)
#17-Jun-22 12:44

Thanks again and one last scenario. For my last step, would scenario (4) TEMP on big HDD, MAP on biggest HDD, be faster than (3) ? I'm not sure how HDD work. Does (4) benefit from reading and writing occurring on separate hardware at same time and thus more speed?

adamw

10,011 post(s)
#17-Jun-22 13:04

Yes, using two drives will be faster than using one in this case.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.