Subscribe to this thread
Home - General / All posts - Strategy for importing/merging large amounts of images into 9
Mike Pelletier

2,122 post(s)
#13-Jun-22 15:48

When importing/merging large amounts of images into 9 it takes lots of time. There's good reason for this, but it deserves a good strategy. I got a 16 TB drive to allow plenty of space and put my Windows TEMP/TMP folders also on that drive. Problem is that I tried importing a folder of 1.4 TB of images and about 95% of them came in with no data (seemingly random location for images that succeeded). I suppose the system ran out of resources somewhere. Also, have a 4TB SSD with Windows on it.

So now I'm importing chunks of data and it is importing fine. Trouble is that it requires lots of manual steps for each chunk to import, merge, delete initial data (or copy merged image to a different project), and then save. Each requiring long delays.

It would be great to have a script for this. With a script, perhaps it would be even better to do smaller chunks of data that will fit on a SSD. This would allow at least the merge and removable of the initial data to be done on the SSD.

Am I approaching this correctly? Ultimately, I want to export the merged into to an ECW. Looking for ideas before submitting a suggestion. Thanks.

Attachments:
Capture.PNG

adamw


10,447 post(s)
#14-Jun-22 13:10

Scripting the merge is currently a bad idea in that the system Merge tool does things which are hard to express using SQL, one would have to roll a complex script function to match the speed.

What are the biggest pain points with a multi-step merge? The way I see it, you have to split all images into, say, 4 roughly equal parts (do not pick files randomly, obviously, group them by location so that the intermediate results are as small as possible). Then, for each part: (a) import images, (b) put all images into a map, (c) open the map, launch the Merge tool, make sure the parameters are correct (pixel size!), proceed with the merge, (d) remove all images except the merged one, (e) save the MAP file. Then you will create a new MAP file, link the MAP files with the intermediate results, and merge these results.

The last step is perhaps fast enough, it's the preparation of intermediate results that will take most of the work. For the preparation of intermediate results, steps (c) and (e) are already fast (in terms of human action involved). Step (a) is also fast, you can import multiple files in one go, you don't have to import them one by one. Step (b) is done like this: you create a map, open it, then you filter the Project pane to only show images, select all images, drag and drop them into the opened map. (To avoid a huge hit from rendering all the images, it might make sense to first drop only one image into the map, zoom into it, then drop the rest into the map. Then perhaps turn all images off in the Layers pane.) Step (d) is fast as well, you select everything except the merged image in the Project pane and delete it.

I agree (a)(b) and then (d) could be performed using a script, but it seems to me the UI is reasonably fast as well. As in, you maybe spend a couple of minutes selecting the files to import, then wait for hours until the import completes, then maybe spend another couple of minutes creating a map and starting the merge, then waiting for hours until the merge completes, etc.

artlembo


3,400 post(s)
#14-Jun-22 13:38

you maybe spend a couple of minutes selecting the files to import, then wait for hours until the import completes, then maybe spend another couple of minutes creating a map and starting the merge, then waiting for hours until the merge completes, etc.

While that is likely the best approach for now, it does sort of make you a slave to your computer for multiple days. When I was working on a large project I did some thing like this. Things were incomplete at the end of the day so I had to go home and pick it up the next day. But, as you say there is another really long step. If that could have been run overnight, things would be ready in the morning.

So, someway to string together the multiple steps would be valuable

adamw


10,447 post(s)
#14-Jun-22 13:46

I see. It is possible to create a script that would create a bunch of MAP files for intermediate results, import a portion of source images into each, then save. But I don't see how one can get around waiting between individual merges. Because running multiple merges at once would perhaps be counterproductive, so you have to be running them one by one and currently they have to be ran manually.

We will think about exposing what Merge does as a query function. (Meaning Merge for rasters, because that's where the complexity is, Merge for vectors is trivial in comparison.)

Mike Pelletier

2,122 post(s)
#14-Jun-22 14:19

Each of my steps have been taking 6+ hours, so it gets to be a scheduling hassle and lots of downtime in between steps. As you say the time for doing manual input to get to the next step is not a problem.

What about the notion of breaking into much smaller chunks so that the import/merge/delete inputs/save all occurs on SSD? If that is a good idea, any suggestions on how to dial in how big a chunk one should do based on available space on SSD?

Also, I tried once having a couple manifold sessions running all on the same hard drive. I cancelled it because it seemed to be taking longer than running them in succession. Is that what you mean by "counterproductive"?

adamw


10,447 post(s)
#14-Jun-22 15:31

Yes, by "counterproductive" I meant that trying to run multiple big sessions in parallel is worse than running them sequentially, running them in parallel might easily take more time. (If the sessions knew about each other and coordinated for resources, it could have been different, but such coordination only happens within a single session, not between multiple sessions.)

On using SSD, how big are the images in pixels, what is the pixel type and how many images do you have? If the SSD is big enough to host all images plus the result of the merge together, that's one story -- in that case, put TEMP onto SSD and save MAP files there as well. But if it is not big enough for that, then it is probably best to use SSD for TEMP and keep MAP files on the bigger drive. And for the final merge step you will need to point TEMP to the bigger drive as well.

Mike Pelletier

2,122 post(s)
#14-Jun-22 15:55

The images are 4-channel tifs, about 150 MB each, and there are 28,259 of them. I have about 3.5 TB of free space on my SSD. The SSD holds Windows.

Sounds like I should put TEMP/TMP folders on the SSD, saving all map files to the large hard drive. Then do as you said above for the final merge.

Appreciate the help Adam.

adamw


10,447 post(s)
#15-Jun-22 10:29

One more thing about the ultimately final step of converting to ECW. Do you really need to do that? We are looking at 4.5 TB of image data, say, we achieve 1:100 compression rate, that's going to produce a 45 GB ECW file. Typically when there's so much data which takes so much effort to prepare, that's because it's going to be used by multiple people. But do you really want to be copying such a big file between machines? I'll remind that if you try working with the file via a network share, that's unreliable. Since the file is going to be read-only, it won't get damaged, but clients can easily hang up or crash. In order to serve file data reliably, you will need something like MANIFOLDSRV. And if you are going to be using that, you don't need to convert to ECW, you can just use the MAP file. Just a thought.

Mike Pelletier

2,122 post(s)
#16-Jun-22 13:15

Thanks for checking on the purpose. The big ECW is used by Mfd 8 in a couple web maps and occasionally as a way to share the data for others. I've been using a 46 GB ECW successfully for many years.

Just wanted to check on this. Does the intermediate merges help enough with the final merge to justify the time for each individual merge?

adamw


10,447 post(s)
#16-Jun-22 15:27

If you are importing individual images, intermediate merges help only if there are many overlapping parts (multiple images covering the same pixels). Your data set likely does not have that. So you can avoid intermediate merges and just merge everything. The only thing I would advise is to merge 10% of the data set first, to gauge time / space requirements for the full set (they should be roughly 10x, ideally, maybe allow for slightly more).

If you are linking individual images, that's a different story, there are various limits onto the number of files that can be opened at the same time and they can be pretty harsh depending on how the files are opened (which libraries are used), can be as low as 1000. So, if you are linking individual images, absolutely do that in portions.

Mike Pelletier

2,122 post(s)
#16-Jun-22 16:55

Okay, good. So it sounds like in general the best strategy would be some automated means to import a large image dataset into a bunch of .map files and use these to do a merge. Ideally use an SSD for the import, sizing each .map file based on available space on the SSD, even if this greatly increases the number of .map files produced. Save these .map files to a large spinning disk as they are created for eventual merging to a new image within a .map file.

Are there speed benefits of having one large spinning disk hold all the .map files with imported images, the new .map file with the merge, and temp files for the merge vs having multiple drives that hold a portion? In my case, I have a 16TB drive that should hopefully be able to do just that for my 4.5 TB of data.

adamw


10,447 post(s)
#17-Jun-22 07:55

Given three scenarios: (1) TEMP and MAP on SSD, (2) TEMP on SSD, MAP on HDD, (3) TEMP and MAP on HDD, the best is likely (1). But with your amount of data, SSD might not be big enough, so go with (2), it should be very competitive with (1). For your last step, SSD might not be big enough to hold just the TEMP either, so you will have to go with (3), which is slowest.

Mike Pelletier

2,122 post(s)
#17-Jun-22 12:44

Thanks again and one last scenario. For my last step, would scenario (4) TEMP on big HDD, MAP on biggest HDD, be faster than (3) ? I'm not sure how HDD work. Does (4) benefit from reading and writing occurring on separate hardware at same time and thus more speed?

adamw


10,447 post(s)
#17-Jun-22 13:04

Yes, using two drives will be faster than using one in this case.

Mike Pelletier

2,122 post(s)
#28-Jun-22 22:14

Got all my images imported and merged. The map file is 3.3 TB. I rebooted and tried to export the image to an ECW on my HDD with plenty of free space. It starts and then fails within a few seconds saying "Cannot write data". It creates a 1 kb ecw file in the process. Is there some sort of max size for an ecw? Other ideas?

adamw


10,447 post(s)
#29-Jun-22 09:26

What are the dimensions of the image, the pixel type and the tile size? We need to fit a single row of tiles into 2 GB. With 128x128 tiles, with 4 bytes per pixel (+1 for mask), that means that the maximum width of the image can be ~3.3 million pixels. That's pretty permissive, but maybe your image is larger than that? (If it is, you can repack the image into another one with reduced tile size. Eg, with 64x64 tiles, the maximum width will be 2x bigger, etc.)

Mike Pelletier

2,122 post(s)
#30-Jun-22 15:27

See attached for all the info on the image. Looks like I'm well within the image width restriction. The 4th band is infra-red.

Attachments:
Capture.PNG

adamw


10,447 post(s)
#01-Jul-22 10:05

OK, the dimensions are fine, indeed.

I looked into the code one more time and I don't see much in the way of limits. There are some limits, but they are huge, bigger than the image in the screenshot. I also did a test export. I couldn't use a 700k x 700k, 3.3 TB image, unfortunately, but I used a 50k x 50k, ~20 GB one with the same pixel type, the test completed fine.

The only thing that I see is that the image might refuse to render, because the index is not there or is not prepared to do that. But I don't see how that could happen if you used the Merge dialog. The Info pane shows that you do have an index, too, so the first worry is ruled out. Still, if you open the image, does it render fine? Does the window ask you to update intermediate levels (by showing a red icon on the image tab)? If it does, update intermediate levels, save the MAP file (I know, a long time, but if you don't do that and something fails, you will have to update intermediate levels again), then try to repeat the export.

Finally, if nothing seems to help, two tests:

Copy and paste the image (not the table), open the properties for the image copy and set the rect to something small (eg, if it was [ 1234567, 1234567, 998765, 998765 ], set it to [1234567, 1234567, 1235000, 1235000 ], about 1000 x 1000). Open the image, observe that it renders fine, then try to export that. Does it work or does it still fail?

Try exporting the big image to TIFF. Does it at least start to export or does it fail too (give it 3-5 minutes to try to fail, then cancel)?

Mike Pelletier

2,122 post(s)
#01-Jul-22 14:33

The image renders super fast. No red icons, pan, zoom are all good. I copied the image, changed its rectangle to 300000, 300000, 301000, 301000 and the image renders fine, exported ECW fine, and the ECW works when I link it back in. Also, exporting the full image to TIFF was working fine up to 5 minutes.

Possibly related, I tried using the NDVI query in documentation on the image and it was running with an estimate of about 4.5 hours to complete. Came back later and it failed with no error message. Just an empty table.

Happy to share the image if that would help.

adamw


10,447 post(s)
#01-Jul-22 16:27

Thanks for the offer. :-) We won't be able to download 3.3 TB (even if the download speed is something sensible, like, 3 MB/s, we are looking at 3,300,000 of these MBs, that's 12+ days, the connection will just break somewhere during these days).

Still, we understand that the issue exists and will try to reproduce and fix it. If you could do one more test, that would help: increase the size of the sub-image from 1000 x 1000 to, say, 100,000 x 100,000 and try exporting that. It will either fail or complete (in a fair amount of time). Then, depending on whether it succeeds or fails, try one more size. If it succeeds: 250,000 x 250,000. If it fails: 40,000 x 40,000. Hopefully, this will let us avoid creating our own 3.3 TB image (we already tried a fake one, the export did not complete yet, but it seems to be working, so given that your test failed immediately, we guess we are not hitting the issue you are hitting, probably because the image is fake, need something less fake).

Mike Pelletier

2,122 post(s)
#01-Jul-22 18:44

It seems the breaking point for failure within a few seconds is between 500,000 x 500,000 and 550,000 x 550,000. I did complete a 100,000 x 100,000 in about 21 minutes and the ecw is 1 GB. I see a bit of degradation as expected. What is the compression ratio being used?

adamw


10,447 post(s)
#02-Jul-22 07:23

Target compression ratio for ECW is 10x. (We should provide means to control it, we will try to do so.)

Thanks for the tests, we'll try to reproduce the issue with the new info.

Mike Pelletier

2,122 post(s)
#05-Jul-22 15:03

Super and thanks.

FYI, I exported the 3.3 TB .map file (one big image) to .mxb and it is 2.4 TB and took 29 hours.

adamw


10,447 post(s)
#12-Jul-22 12:19

As a heads up, we made several improvements to the export code for ECW / JPEG2K. These improvements will be available in the next cutting edge build. While we were not able to reproduce the original issue described in the thread, it will be worth retrying the export with the new build, once it lands. (We have been able to export a 750k x 750k image without any problems during testing.)

Mike Pelletier

2,122 post(s)
#12-Jul-22 15:59

Thanks for that note. Looking forward to trying it. FYI, I exported my big image as two ecws (28 GB and 19 GB) without any trouble.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.