Subscribe to this thread
Home - General / All posts - This program will never make it - but why I use it anyway! (long post, sorry)
artlembo


3,400 post(s)
#01-Apr-21 16:33

The This program will never make it thread was getting a little too focused on a few specific issues, and while I think there is really good discussion in it, I wanted to open this new thread to address some of the issues I've been thinking about as it relates to 9, and where it fits in to people's solution. Hopefully, others can discuss how they are using 9, and how it fits within their solution set.

First things first, what I call the donut hole

I like 9. It's kinda cool to work with, but like many others have said, it is still a work in progress with some limitations. Also, the interface isn't as polished as I might like.

Smaller data

The reality is, if I have a dataset with 2 million vector features or less (basically any county GIS data), Manifold 8 is still awesome - albeit, dated when it comes to reading in certain formats. QGIS is also very easy to work with if you have 2 million features. ArcGIS Pro has also repackaged things very nicely. All 3 of these products cut through the data really, really, nicely. So, personally, I won't likely use 9 to work with data sets under 2M, as there are just more mature products that get the job done. This is where I think most people live.

Insanely large data

Someone once told meif it fits on your computer, then it's not big data. I tend to agree, but as I deal with GIS professionals as an educator, I like to tell them if it's more data than you've ever worked with, then it's big data to you. But, for argument sake, let's go with the first scenario, massive amounts of data.

In reality, QGIS, ArcGIS Pro, and Postgres aren't going to cut it, and neither are any of the versions of Manifold. To deal with this, you likely need a server, with lots of clients. And, in many cases, you need to spin up something like Hadoop to make it work. I've been there, done that, and never want to go through that experience again. I'm no rocket scientist, but I'm no dummy, either. However, to stand up a Hadoop cluster took many days. But, it got the job done. In reality, with my 40 seat lab, I could probably just break the data up into 1/40th of its size, put it on each computer, and then run Postgres on it over a weekend (getting my exercise by running to each computer to hit the run button!).

I don't see many people living in this ecosystem, and those that do, likely get paid a lot of money to spin up servers.

Very large data - the donut hole

Many people fall into this donut hole - they don't have the insane amount of data that requires Hadoop, but the data is way larger than a few million objects. This is where 9 really shines. I don't know the breaking point yet, but let me give you an idea of how I just used 9:

I'm doing some work with the International Union for Conservation of Nature Red List of Threatened Species, and we are looking at worldwide monkey population and their relationship to mangroves.

The mangrove data is 330 billion pixels in size (yes, billion). It is a binary data source, so it is rather ridiculous to store the data like that in a raster, but such as it is. It turns out, there are 61 million mangrove points that we can extract from the raster.

The monkey polygons are not particularly a large data set, but the number of vertices are insane - oh, my goodness, it's huge! Really intricate, sinuous polygons.

We needed to determine the number of mangroves in each of the monkey polygons, and we had to buffer each monkey polygon by different amounts.

After working with the data for months, the researchers came to me, because they knew I sort of swim in this ecosystem. 9 was almost perfect for this. Almost, and hopefully, what comes next could be some good additions to the program.

I'm going to be writing a formal paper on the methodology, but the following is what I did:

1. Convert the 330 billion pixel raster to 61 million polygons (9 struggled with this, but ArcGIS Pro was able to do it).

2. Convert the polygons to point centroids in 9. I didn't even need to write code for this, just used the GUI.

3. Used the Join dialog to count the points in the polygons (more on this below).

4. 17 minutes later, the process was done! My colleagues have been running their same command for 3 days.

Getting the overlay to work

Just like the other GIS products, 9 couldn't really run the process after 24 hours. So, I used a little trick in PostGIS called ST_Subdivide, and turned the 928 monkey polygons into 472,000 polygons with no more than 20 vertices. You can see my logic for subdividing the data here. After completing the join, I knew how many points were in each subdivided polygon, and simply did a sum and grouped by the monkey polygon id.

That was it. I can't tell you how much data is actually there, but after everything was done, I had used up over 100GB on my computer (some data is duplicated in Postgres, and some in 9, and some in a geodatabase). But yes, a good amount of data that.

So, 9 is a perfect tool for that donut hole, converting many days of processing to under 20 minutes. And, even turning many months of batting around ideas to about 3 days (I experimented with a few options, so that's what took me 3 days to get my head around this).

In this case, 9 was the only way to blast through this data without going DEFCON V with Hadoop. So, for $95, you have a screaming software product to deal with lots of data and intricacies.

Things I'd like to see Manifold add:

1. a really good raster to vector (point or polygon) tool to convert pixels values to points (the current tool could not get through all the data in any reasonable amount of time).

2. a ST_SubDivide command. This is really critical if you are going to work with big data. Sometimes you have to partition the data to give the software a chance. I'd add to that, the ability to create tessellations like hexagons because they are often used to partition a large empty space.

One comment about the Help Manual:

The biggest Con with the help manual is searching for what you are looking for. It's terrible in my opinion. I think a .chm provides so much greater flexibility. I would hope they could produce something like that. I truly hate the search function on the browser and the index tab.

The biggest Pro with the help manual is the detailed information. I never appreciated it more than when I worked on this project. The examples are really great and detailed, and I had never used the Join function before, and taking only 10 minutes to read that section and the example of counting cities inside of states was all I needed. I'm a reasonably intelligent person, and the help manual treats you as such - it is written with a lot of detail so that if you've never used a function before, they will step you through it rather nicely. But, they do expect you to read it. And no, I didn't read hundreds of pages. But, I did read the pages I needed to, and took that aspect seriously (BTW, I'm the guy who when fixing a lawnmower doesn't read the directions, I jump right in - that won't work with 9, you really do have to read it, and all the info you need is right there).

My apologies for such a long post, but I hadn't really seen much about how people are actually using 9 in the wild, and where it fits in. Hopefully other can elaborate on their own use-cases.

adamw


10,447 post(s)
#01-Apr-21 17:12

Thanks! Very interesting!

For 1, extracting vector points from raster, are we talking about Trace? Were points single-pixel or small clusters of pixels? Trace extracts areas, so if you needed points, there was a lot of unneeded computations that could have been eliminated. We could probably have an option to convert small areas into points, this would have helped the performance, likely significantly.

For 2, ST_SubDivide, we hear you. We are also planning to make it less necessary to run things like ST_SubDivide on large geoms - although if you are willing to deal with merging the results back, it'd still make sense to run it.

In general, our focus is on making 9 a much better fit for your "Smaller data" scenarios. That's what we are concentrating on currently.

Thanks again, a great write-up.

artlembo


3,400 post(s)
#01-Apr-21 20:52

extracting vector points from raster: as I mentioned, the raster is binary (1/0). So, any pixel with a 1 we'll turn into a point, and ignore the 0s. Therefore, I wasn't interested in areas, but rather individual pixel centroids. I was attempting to use TileGeomToValues. But, the 330 billion pixels was just too much to deal with. I even wrote a script to cycle through the st_subdivide geometries so break it off into smaller chunks, but to no avail.

tjhb
10,094 post(s)
#01-Apr-21 22:14

Art, I'm curious, did you try a workflow that did not convert the raster(s) to centroids?

I don't know the details, but perhaps you could just decompose the monkey polygons into triangles, or into convex parts (a familiar step to both of us), then Join those smaller areas to the original rasters. (Then sum.)

Raster storage is massively efficient, especially in Manifold 9, and the Join dialog is flexible enough to allow raster-vector analysis without translation. Power.

Tim

artlembo


3,400 post(s)
#01-Apr-21 22:46

Good advice. And yes, I did attempt to do that. The storage, as you say, is very efficient. I think the problem is when you need to unpack the data to find out which pixel actually falls in an area. It’s the classic compression/decompression problem.

I think the join command is awesome. But, having 330 billion pixels really does clog the pipes up a bit :-)

Nonetheless, I hope you are as impressed as I am that the actual overlay got done in 17 minutes. I tried to do the overlay in postGIS with the subdivided polygons and it was taking hours and hours so I just killed the process. 17 minutes, wow.

tjhb
10,094 post(s)
#01-Apr-21 23:01

Yes I am just as impressed as you are. For two reasons. (1) You made something impossible into something possible. That's helpful! (2) At 17 minutes, you can afford to experiment or to tweak. At many hours to days, you have to get the thing right once, or you're either stuffed (if it didn't work) or quite grumpy (if it worked but suboptimally).

In both cases, a difference not just in degree, but in kind.

adamw


10,447 post(s)
#02-Apr-21 14:38

OK, so just to recap:

We have a raster of 330 billion pixels (that's short scale, right? 330 * 10^9?). In your example they are either 1 or 0, but I presume we can expect something like INT8 or INT16 and maybe conditions like BETWEEN 200 AND 300.

We need to turn each pixel that satisfies the condition (in your example, equal to 1) into a geom.

There are 61 million of such pixels in total (that's from the first post).

Using TileToValues (TileGeomToValues was likely a typo, right?) is too slow.

Correct?

61 million geoms is manageable, so if the numbers above are correct, a reasonably fast way to do this seems totally within reach.

artlembo


3,400 post(s)
#02-Apr-21 15:06

yes, you have that mostly correct. More specifically:

the raster: 1,295,779 x 256,004

and yes, mine is such a limited use/case that I think it is reasonable to expect that BETWEEN clause for other uses.

TiletoValues: I actually used TileGeomToValues, and yes, that was too slow. That gave me the X,Y, and Value. I did not actually try the TileToValues. I think TileToValues would not work well on 330b pixels - it seems to first want to eat the elephant in one bite. I thought TileGeomToValues would be better because we are narrowing the area that we are searching.

Also, in this case, the data is very sparse - only .0004 of the pixels have a value we are interested in. So, I'm guessing there are tons of tiles that have nothing in them. I'm not sure if that helps any.

adamw


10,447 post(s)
#03-Apr-21 17:10

I did some experiments and, you know, using TileToValues isn't too bad, but yeah, there's room for improvement.

My test images were: small (441 million pixels), large (44.1 billion pixels, it's just the small image increased 10x by both X and Y). Your image is 7.5x larger than my large one, but my large is likely large enough to make things scale linearly, so we can just multiply times for my large 7.5x and that would produce some estimates for your image. I was creating points for pixels with a value that occurs slightly more frequently than yours (1.5x or so), but perhaps close enough.

At first I started with the simplest query I could write:

--SQL9

SELECT GeomMakePoint(VectorMakeX2(tx*128+x, ty*128+y)) AS [geom]

INTO [geom202] FROM (

  SELECT x AS tx, y AS ty, SPLIT CALL TileToValuesX3([tile]FROM [small]

WHERE value = 202;

This finished in 615 seconds. In the top-level query, TX is the X coordinate of a tile, X is the X coordinate of a pixel within the tile, 128 is the tile size, so TX*128+X is the X coordinate of the pixel in the whole image, same for Y.

I tried optimizing the query on the small image before running it on the large one (who has time to run unoptimized first versions on full data), and filtered out tiles which have no pixels with the desired value:

--SQL9

SELECT GeomMakePoint(VectorMakeX2(tx*128+x, ty*128+y)) AS [geom]

INTO [geom202_2] FROM (

  SELECT x AS tx, y AS ty, SPLIT CALL TileToValues(tileFROM [small]

  WHERE TileValueMin(tile) <= 202 AND TileValueMax(tile) >= 202

WHERE value = 202;

This finished in 95 seconds.

I pushed computing TileValueMin / TileValueMax to threads, etc, this got the time down to 68 seconds, but the query became large so I just kept the one above that finished in 95 seconds. Ran it on the large image, it finished in 3448 seconds. I expected it to take longer, but there are various economies of scale which let it take just 36x as long as the query for the small image while the amount of data increased 100x (this happens, too, unfortunately a lot of it stops after we run out of the cache and with the large image we are already out of it, there likely won't be any such savings as we increase the amount of data further).

OK, so this means that for your 330 billion pixels image we are probably looking at something like 7 hours. Given 1 byte per pixel, the size of your image is 330 GB + something for masks (if all pixels are visible, this is negligible, so let's not count it). This is a lot of data, just reading which is going to strain the disk. The ballpark figure for how much data we would normally expect to read + analyze per unit of time is 50 MB/sec - obviously, modern disks can provide data at a faster rate, but data isn't always coming sequentially and there are all kinds of delays during processing, so 50 MB/sec is a quite fair number. With that, ideally we'd expect to process 330 GB in about 2 hours. So, there is a gap between 7 hours (expected from the query above) and 2 hours (ideal).

We think we can reduce this gap or maybe even close it completely with a couple of small additions and extensions to SQL. We'll see.

tjhb
10,094 post(s)
#03-Apr-21 19:21

Small anomaly? TileToValuesX3 in first (simplest) query.

Also, I'm interested in the combined prefilter

WHERE TileValueMin(tile) <= 202 AND TileValueMax(tile) >= 202

which would not have occurred to me.

Can you explain why, even though it is a composite test (scanning each tile twice?) it is so efficient?

And then... how about swapping the functions/conditions to

WHERE TileValueMin(tile) >= 202 AND TileValueMax(tile) <= 202

and omitting the outer WHERE clause, redundant now because 202 will uniquely satisfy the prefilter?

...Well, I think I can partly answer myself.

TileValueMin() <= x and TileValueMax >= y will both short-circuit, i.e. return true as soon as they find any value satisfying their conditions. They will only scan the full tile if the condition either fails or succeeds only on the very last value.

But TileValueMin() >= x and TileValueMax <= y will have to scan the full tile (twice) in all cases.

So this swap is a very bad idea!

But I'm still interested in why your dual pre-scan is efficient. I wouldn't have thought of this.

adamw


10,447 post(s)
#06-Apr-21 13:56

TileToValuesX3 should be TileToValues. (One of the tests was operating on a 3-channel image, taking one channel out using TileChannel, I mixed up things by mistake.)

TileValueMin ... AND TileValueMax ... does scan the pixels of each tile twice (once for function), but this happens in memory, which is fast. What would have ruined it is if the data set, which resides in a mix of memory and disk, was being scanned twice, but it only has to be scanned once.

I couldn't use WHERE TileValueMin(tile) >= 202 ... because in my test, the tiles contained a mix of values on both sides of 202, so if a tile contained pixels with values of 150, 202, 250, the test would have thrown it away - while I needed it to stay because 202 is there. Yes, anything in AND can short-circuit. Keep in mind that the query engine can re-arrange or re-write conditions, though, if it thinks it can improve things (we mostly improve the blatantly obvious).

adamw


10,447 post(s)
#06-Apr-21 14:15

A small addition: scanning the pixels of each tile in memory twice improves the performance of the query because doing this is faster than converting each pixel into a record, then iterating on these records outside of the functions in the query engine. Converting anything to records - even virtual ones - and iterating on them is not free.

artlembo


3,400 post(s)
#03-Apr-21 22:57

I’ll give this a try tonight, and report back what I find.

artlembo


3,400 post(s)
#04-Apr-21 01:08

Adam,

I’m willing to bet that it will go a little bit faster than you think. The reason is, you are computing information about 330 billion pixels. But,a majority of the tiles are all zeros. So hopefully, 95% of the image will just be ignored and you’ll only be processing the small portion of the globe that has the mangroves in them. I don’t know if some of those statistics are already calculated about the tiles or not. If so then yes, you are probably going to have to search through 330 billion pixels. But if they are predetermined with summary statistics then they can simply be ignored and none of that data needs to be processed.

tjhb
10,094 post(s)
#04-Apr-21 02:04

Yes. So the tile prefilter you will apply is just

WHERE TileValueMax([tile])

right?

artlembo


3,400 post(s)
#04-Apr-21 04:02

Exactly. I will be heading into lab until the afternoon, so I’ll let you know how it turns out

artlembo


3,400 post(s)
#05-Apr-21 14:08

so this is interesting - it took 87 minutes. Again, that is likely due to the fact that most of the tiles are empty. So, 87 minutes is certainly acceptable to me. But, here is another issue: it extracted 121 million points. So, it extracted double the points. I'm not going to see if 9 brought in too many points, or if the person who sent me the points from ArcGIS gave me too few.

danb

2,064 post(s)
#05-Apr-21 20:35

I am not sure if this is relevant to your case, but one time when I was putting together a multistep workflow involving large binary mask grids, I found it beneficial to do a pre-scan using (I think) TileCompare to identify all tiles that matched my JSON definition of a blank tile (there are probably better ways to do this).

I added an additional column to the image table and set an attribute to mark tiles that I could safely ignore from all further steps of the process and this made a big difference to the performance of the workflow as a whole.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

tjhb
10,094 post(s)
#05-Apr-21 20:51

Nice!

Besides TileCompare, you could equally use TileMax or TileMin I think? That might be faster.

danb

2,064 post(s)
#06-Apr-21 00:11

Thanks for the pointer. This does make more sense thinking about it.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

adamw


10,447 post(s)
#06-Apr-21 14:06

Well, I am not sure how the query with TileToValues can extract duplicate points. It's pretty straightforward. I would guess the extraction performed in ArcGIS was not 1 pixel = 1 point, perhaps it was merging points for small clusters of pixels. That's just a guess though.

Anyway, I can report that we did manage to eliminate the gap between the ideal processing time (based on IO rate) and the actual processing time entirely.

Here's a modified version of the query that you will be able to write in the upcoming build:

--SQL9

SELECT GeomMakePoint(VectorMakeX2(tx*128+x, ty*128+y)) AS [geom]

INTO [geom202] FROM (

 SELECT x AS tx, y AS ty,

   SPLIT CALL TileToValues(TileMaskRange(tile, 202, 202, TRUE, TRUE), FALSE)

 FROM [large]

);

There are two changes: (1) the new function called TileMaskRange takes a tile and a range of values (here, min=202, max=202, so just 202), checks for each pixel whether its value falls into the range, then turns that pixel invisible depending on the result (here, pixels with values falling into the range are kept visible and all others are turned invisible), (2) the new argument to TileToValues skips reporting invisible pixels.

With these changes, the time to process the large image (which used to be 3448 seconds) drops to 1091 seconds. Which is pretty much the lower bound expected for processing 50 GB of data (1000 seconds).

No threads needed as this is already bound by IO as it stands.

artlembo


3,400 post(s)
#06-Apr-21 15:47

thanks. This is great, I look forward to trying it out. I ran the overlay with the 121M points, and the overlay worked in 24 minutes (up from 17 minutes with 61M points). So, this is plenty fast.

I decided to rerun the query with the original monkey polygons (as opposed to the ST_SubDivide version). It has now been over an hour, and it is still processing. So, the ST_SubDivide was a big win in this case.

tjhb
10,094 post(s)
#06-Apr-21 18:09

GeomToConvex(Par) would probably do just as well as ST_Subdivide? And might even be faster with appropriate use of threads?

artlembo


3,400 post(s)
#06-Apr-21 18:32

maybe. I’ll test it out. It’s just nice to be able to define the number of vertices.

artlembo


3,400 post(s)
#06-Apr-21 19:20

still running after 4 hours, so I killed it. Therefore, ST_SubDivide is definitely necessary. I'm going to try Tim's suggestion next about Convex Parts. We'll see how that works. It creates many small polygons, but it also has some really large ones, as well. So, if I were a betting man, I'd say they should be close. Hopefully, I'll have an answer in about a 1/2 hour.

artlembo


3,400 post(s)
#06-Apr-21 20:18

doing the overlay with the ConvexParts took 34 minutes vs. 24 minutes with the ST_SubDivide. Given that the problems was virtually unsolvable without splitting the polygons, ConvexParts is perfectly acceptable - that is, what's 10 minutes? And, it doesn't require you to move data in and out of PostGres to perform the ST_SubDivide.

Nonetheless, I still think for big data analysis, ST_SubDivide and the ability to create a hexagonal grid are important tools to have.

tjhb
10,094 post(s)
#06-Apr-21 20:28

Excellent! Well said.

If we had the equivalent of ST_Subdivide I would certainly use it as well, especially on contour-derived and landcover data.

I'm glad that convex parts is comparatively not too bad though! Good test Art. This whole thread increases community knowledge.

danb

2,064 post(s)
#07-Apr-21 00:37

I'd always been laboring under the impression that something mathematically magical happened regarding point in polygon operations as soon as convex polygons came into play. Is this not the case? Is simply splitting complex areas into many simpler parts enough to gain additional performance?


Landsystems Ltd ... Know your land | www.landsystems.co.nz

tjhb
10,094 post(s)
#07-Apr-21 01:11

I think ST_Subdivide (also) guarantees convexity. All the resulting parts are convex I think? [No.]

So the magic applies here too.

[I am wrong. Concavity is allowed in the result of ST_Subdivide.]

tjhb
10,094 post(s)
#07-Apr-21 02:15

It would be interesting to kmow the maximum number of vertices Art specified for ST_Subdivide (and how sensitive this was).

rk
621 post(s)
#19-Apr-21 19:49

I just learned, that despite the impression I had from reading PostGIS docu, ST_Subdivide algorithm is not part of GEOS/JTS/NTS.

It is implemented in PostGIS lwgeom.c (lwgeom_subdivide_recursive)

So, sadly, my NTS integration project will not give us ST_Subdivide equivalent.

artlembo


3,400 post(s)
#20-Apr-21 02:28

<deleted>

checking on some different functions with ArcGIS Pro to see how their Spark engine handles all of this. Will let you know my result in the morning.

artlembo


3,400 post(s)
#25-Apr-21 17:36

just ran the process with ArcGIS Pro. Now, keep in mind, I have a license for the GeoAnalytics Server - it uses Spark for processing, so can perform parallel processing. But, it is an extra feature that you have to pay for separately.

Anyway, the summarization of the 121 million points into the monkey polygons took 12 minutes, opposed to the 24 minutes in Manifold 9. So, ArcGIS Pro does an excellent job processing the data. Just so you know, ArcGIS Desktop could not complete the overlay after a couple of days, so using the GeoAnalytics server was necessary. Similarly, Manifold 8 could not complete this, and Manifold 9 couldn't complete it without subdividing the areas. So, comparing ArcGIS GeoAnalytics Server with Manifold 9 is a closer comparison.

But, a couple of caveats:

I had to turn the raster pixels to points, and that required two steps (if you recall, Adam's query also did it in 2 steps, but embedded the entire thing in an SQL query):

1. set all pixel values = 0 to Null (2 hours)

2. run RastertoPoint to generate the points (3.5 hours)

so, that was 5.5 hours to prepare the data. If you recall from my comment below, Manifold 9 ran it in 87 minutes.

I think ESRI is going to have to revisit RastertoPoint and see if they can shave some time off of it.

But, this gives some comparison of the two products. I'll leave it to others to comment on price, ease of use, etc.

tjhb
10,094 post(s)
#25-Apr-21 22:20

Great result Art, great report too (with a large dose of understatement).

Do you happen to know what the ESRI GeoAnalytics Server option typically costs for commercial users? I suppose it is a silly question. If you have to ask, you can't afford it. It depends on the number of licences you need. It depends on your golf handicap. Let's finish this excellent lunch before we talk about money. And so on.

artlembo


3,400 post(s)
#25-Apr-21 23:48

Here is a price list for Idaho. 40 pages. Just remember, there are a couple of things to be aware of:

  • there is a price for the software - that’s what you pay up front.
  • there is a maintenance price, if you want support and upgrades.
  • there are some options for a 365 day license.

Also, the GeoAnalytics server comes with 4 cores, and you pay for each additional core after that.

tjhb
10,094 post(s)
#25-Apr-21 23:57

Thanks. That is much worse than I thought. It's hard not to feel sick after reading that.

Have you ever wondered, Art, how much you have saved the US taxpayer, and private US firms, and other universities, just by making your students aware of the excellent alternatives to ESRI products that exist in the wide world of sports?

antoniocarlos

609 post(s)
#25-Apr-21 23:57

Ja. I really dont know how to read this price list except that if you realy want to save 12 minutes you have to shell out $18,000 + the rest of the ESRI bundle.


How soon?

tjhb
10,094 post(s)
#26-Apr-21 00:51

That "time saving" is (87 minutes - 5.5 hours) + (24 minutes - 12 minutes) = -243 + 12 = -231 minutes.

So all up, if I have read correctly, ESRI's products took 3 hours 51 minutes longer than Manifold 9, in Art's carefully worked example.

That's what I meant by his large measure of understatement!

artlembo


3,400 post(s)
#26-Apr-21 01:49

Well, the reality is, it isn’t that simple. You aren’t buying an ESRI product to save 12 minutes on a particular one off job. You’re purchasing it to do your work 365 days a year. A lot of people on the forum are single users, so those prices are out of reach for sure. But, a consultant doing millions of dollars of work a year barely notices the cost of the software. It’s basically budget dust. When I was in private industry we had over a dozen licenses, and we kept all of those licenses very busy doing a lot of work. And remember, those licenses gave us a connection to other ESRI users, which increased our work.

Recently, our lab did a small project using the ESRI data collection app. The data collected from that and the maps created enable this community to then get a $1.6 million community improvement grant. That’s a pretty big return on investment, and sure, you could save money by using open source but the bottom line is the cost of the ESRI software is almost nonexistent when you were talking about add-on projects valued at over $1 million.

And in all of my travels my entire career, I have never come across another manifold user at an ice cream shop, workshop, or conference. But, if I were to throw a frisbee I could probably hit an ESRI user. So, there are great benefits to having thousands upon thousands of people out there that know how to use the software.

So at the end of the day, ESRI supports hundreds of thousands of projects around the world and has brought great benefit to the planet. I would liken it to SQL server or Oracle. Lots of organizations with really mission critical applications could use open source products but they choose not to. And, when you start to scale up Oracle it gets very expensive. But, those organizations understand the cost benefit.

Now, all that said, a lot of people have really benefited from using either manifold or open source software because they can finish the project at very low cost and deliver the results to a client. I think it does allow an individual person or a very small organization to have some pretty remarkable tools so that they can potentially compete for large jobs.

tjhb
10,094 post(s)
#26-Apr-21 02:00

I get that it is not that simple--but that is the really big problem.

Kool-Aid works! Especially when poured into tall glasses by hand-wringing suits(TM).

It's just not fair business, and it wastes massive amounts of public money.

Public money is mostly just printed now, the world over, but still some of it comes from taxing real production and wages.

(And printed public money, while it may not dilute private wealth, amplifies private poverty, especially if the resulting public money is wasted.)

tjhb
10,094 post(s)
#26-Apr-21 01:16

This is my least favourite part.

Attachments:
6112C6DA-DB8B-410B-97BE-639D1D340584.jpeg

artlembo


3,400 post(s)
#26-Apr-21 02:01

Well, I’m the guy who plays golf late in the day so I can take advantage of the lower twilight prices so I’m not the best person to necessarily comment on this :-)

but, the remote support is $275 an hour. I think that is probably around what most civil engineering firms charge for their engineers who do design work. Having an accountant do your taxes probably costs between $300-$500. They probably only work one or two hours on the tax return.

The on-site support is expensive, but consider the mission critical nature to what is being done. Imagine, it’s Christmas Eve and the pipes in your house just burst. What do you think a plumber is going to charge you to come out. Most plumbers charge between $75 and $100 per hour. I bet that doubles when your basement is flooding on Christmas Eve and you need someone to fix things.

I’ve been in academia for a long time now, but sometimes I get pulled in to do some one-off projects and I am astonished at how much is being charged. I’ve often been given advice to not lowball projects because you aren’t taken seriously when most people are aware of what the going rate actually is.

tjhb
10,094 post(s)
#26-Apr-21 02:13

I was perhaps misreading. I took "disaster" to mean, e.g., Katrina, not burst pipes or a failed server. Dunno. That's why that struck me. ESRI does, probably, provide real disaster support for free, and if so good on them.

I wonder if GIS tech can materially help India at present? I doubt it can help much, in real time anyway. It's not simple--but good on the USA (and also the UK) for giving so much aid right away.

artlembo


3,400 post(s)
#26-Apr-21 02:23

The burst pipe was a joke on my part. I think you read the price list correctly. And yes, I’m sure they provided a lot of free support but in some circumstances they probably put people on site to help in an emergency situation. The money they get paid probably doesn’t come out of little old ladies savings plans, but directly from FEMA :-)

danb

2,064 post(s)
#26-Apr-21 00:24

We get our ESRI toolset under a corporate bundle. I believe that unlimited licenses for most of the ESRI suite is part of the deal. GoeAnalytics Server is included though I don't think it has been deployed yet.

Because the cost is in one big annual chunk I don't think that our bean counters would even consider the actual/individual cost of the software components. It is easy to budget for.

I guess what really swings it for those that do the purchasing (non users) is firstly that most other govt organisations in NZ use it and secondly the armies of hand wringing suits available 24/7 to tell what great work they are doing and how invocative we are. Its a shallow ploy, but everyone likes to be flattered.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

tjhb
10,094 post(s)
#26-Apr-21 01:04

It is easy to budget for.

Not everyone gets your sense of humour as well as I do Dan!

Dan is being ironic! (And a bit lamentful I expect.)

Like Art, Dan has done a massive amount to advance the use of Manifold, both 8 and 9, at the public sector coalface.

I would like to say more but won't.

Dimitri


7,413 post(s)
#26-Apr-21 08:25

So, ArcGIS Pro does an excellent job processing the data.

I think it would be more accurate to say "GeoAnalytics Server does an excellent job processing the data."

If you're using GeoAnalytics Server, you're running ArcGIS Pro as a dumb client. GeoAnalytics Server is doing the work. Right?

artlembo


3,400 post(s)
#26-Apr-21 11:50

I’m still trying to get to the bottom of how it is implemented. It is considered an extension, but in the implementation it appears to be transparent to the user. It looks like you’re using a spatial analyst tool, but in reality you’re using the data analytics tool.

artlembo


3,400 post(s)
#27-Apr-21 14:45

also, to follow up, the ArcGIS tools are exactly alike, whether using Spatial Analyst or GeoAnalytics Server. The messiness of setting things up is hidden from the user. And, they can easily be placed in a model using Model Builder, just like any other ArcGIS tool:

so, that is rather nice.

I'll also say, the results between Manifold and ArcGIS Pro are 0.9998 similar. A handful of points are counted differently. I imagine due to each system doing something slightly different for the containment clause.

On another note, having been doing these runs over and over for the IUCN researchers, I can say that the Join feature in 9 is really slick, and very quick to spit out. In fact, I was considering just rerunning the SQL and changing the name of the monkey polygons, etc. (and you guys know how much I like SQL), but honestly, the join function is just so simple to use, and quite flexible.

I'll say that I like the ArcGIS Pro Summarize Within Tool, but can also say that the Join function in Manifold is a different way of doing things, and also quite pleasant to use.

Attachments:
modelbuilder.jpg

Dimitri


7,413 post(s)
#27-Apr-21 18:01

The messiness of setting things up is hidden from the user.

Maybe I'm missing something, but I don't see how the messiness of setting up GeoAnalytics Server is hidden from the user. It looks like a pretty involved process. Is there a shortcut somewhere?

As far as I can tell, ArcGIS Pro working with GeoAnalytics server is somewhat analogous to QGIS working with PostgreSQL/PostGIS. You can't just fire up QGIS and start doing stuff in your project. You have to install PostgreSQL, add PostGIS, load up the DBMS with your data, and then you can connect to it using Q as a client and do things in PostgreSQL. But you can't do something like write an SQL query within your project that joins data stored in a local shapefile with what's stored in PostgreSQL, or with what's stored in, say, Oracle.

It looks like Pro has better integration with GeoAnalytics Server than Q has with Postgres, at least for their own toolset. But, alas, not with SQL. If you want to do real SQL with a file geodatabase from ArcGIS Pro it appears that cannot be done. But you can do real SQL against an ESRI file geodatabase from 9. :-)

artlembo


3,400 post(s)
#27-Apr-21 20:16

Maybe I'm missing something, but I don't see how the messiness of setting up GeoAnalytics Server is hidden from the user. It looks like a pretty involved process. Is there a shortcut somewhere?

yeah, I was pretty stunned to be honest with you. But, let me clarify:

when I first read the help manual, I thought: no way I'm doing that! It reminded my of my Hadoop work. But, as I was messing around with Summarize Within, I saw that it was offered as part of Spatial Analyst and also as part of GeoAnalytics Server extensions. So, I simply tried out the GeoAnalytics Server part, and whatdya know, it just worked.

I think the issue is that you can use ArcGIS to stand up a true distributed server, or, just use it out of the box, and it will take advantage of whatever cores you have on your computer. So, that's what I did - just used it out of the box.

So, as you say, ArcGIS Pro has very good integration with GeoAnalytics server, as long as you are willing to just use the cores on your computer. And yes, no SQL. So, some of the flexibility we enjoy with SQL is not there yet. But, it's integration with Python is quite good.

QGIS can perform spatial SQL on a geodatabase, but they use Spatialite as their engine, and because it doesn't reside in an SQLite database, it is dreadfully slow. It you need more details on this stuff, feel free to message me.

Dimitri


7,413 post(s)
#28-Apr-21 08:48

QGIS can perform spatial SQL on a geodatabase, but they use Spatialite as their engine, and because it doesn't reside in an SQLite database, it is dreadfully slow.

Art, I don't see how that works.

Example:

1. Download the Naperville Gas file geodatabase and unzip to get an example file geodatabase.

2. Launch QGIS and add a vector layer from that GDB using the OpenGDB driver that's built in. Add the Tax Parcels layer.

3. OK, so now you have a NapervilleGas TaxParcel layer. How do you do real SQL on that? From the main menu you can choose Database - DB Manager and then drill down through Virtual Layers - Project Layers - NapervilleGas TaxParcel to pick that, and then you can open an SQL Window by clicking on the wrench icon.

4. Super. Now you can write queries like

SELECT * From [NapervilleGas TaxParcel];

but you can't write things like

SELECT * INTO [New TaxParcel] FROM [NapervilleGas TaxParcel];

Are you saying Q is launching SQLite and temporarily moving data from the GDB into an sqlite data store, running the query on that, and then showing results?

Dimitri


7,413 post(s)
#28-Apr-21 12:14

A quick follow up: Couldn't resist trying it. :-)

Basically, it seems with Q if you're not executing queries within some external database, you're limited to using SQLite to generating views of vector layers, what they call virtual layers. It appears that Q takes your GDB table, converts that into an SQLite temp file, runs the SQL SELECT to generate a view and then links that view into a new virtual layer.

So you can do SELECT, but there's no manipulation of the original data, for example, with an ALTER or UPDATE statement. For example, in Manifold if you want to use SQL to update the TaxParcel table in your ESRI file GDB you can write:

UPDATE [NapervilleGas.gdb]::[TaxParcel] SET [CVTTXDSCRP] = 'Village of Lisle' WHERE [CVTTXDSCRP] = 'Lisle';

But it looks like you can't do that in Q.

Given that you can't do an UPDATE, you also can't join into a table, not even doing something as simple as the Join Example: Add Publisher Name to a Table of Book Titles example, the UPDATE query being:

UPDATE ( SELECT t.[title_id] AS tkey0, t.[pub_name] AS t0,  s.[pub_name] AS s0 FROM [titles] AS t LEFT JOIN [publishers] AS s ON t.[pub_id] = s.[pub_id] SET t0 = s0;

I've tried a bunch of different ways to update an original table, but haven't been able to find a way. If anybody knows how, I'd be grateful to learn how. [I mean, of course, using real SQL to manipulate the original table in place, and not along the lines of loading the table into PostgreSQL and doing the manipulation there...]

artlembo


3,400 post(s)
#28-Apr-21 13:37

yes, that's true as far as the geodatabase is concerned. If you are connected to GeoPackage, PostGres, or SQLite, you can do UPDATE statements from QGIS in the Database Manager. At the moment, you can use QGIS to edit a geodatabase from the GUI, but if you use the Database Manager, it appears that QGIS only accesses it as a "virtual layer". The virtual layer is really a VIEW, so no editing. That might be worth making a request to the QGIS developers.

One note: SQL on Postgres, geopackage, and sqlite is quite good, as it actually uses their command langauge. The virtual layer seems to use spatialite, but because it isn't actually in a SQLite database, it is horrendously slow.

Dimitri


7,413 post(s)
#28-Apr-21 16:19

SQL on Postgres, geopackage, and sqlite is quite good, as it actually uses their command langauge.

Well, sure, because you're doing SQL in the external database package, not in Q. :-) Q is just being a dumb client, like using a browser connected to the database package's web interface.

There's a huge difference between having real SQL in the GIS and just being a client to something else that has real SQL. I don't mean to pick on Q as that also applies to all the other desktop GIS packages that don't have real SQL either, like FME, ArcGIS Pro, and so on.

So what are the downsides of not having a real SQL engine within the GIS itself? Lots...

For starters, when the only SQL you get is whatever SQL the data source you're connected to provides, that means if your data source doesn't provide an SQL you don't have SQL. That means no SQL for most of those hundreds of formats and data sources to which GDAL connects.

Second, when you do get SQL it's a different SQL depending on the data source. Can't write one query that works in all the data sources. If you're connected to Oracle, to Postgres, and to SQL Server you have to write three different queries using three different SQL implementations you've learned.

Third, that means your queries can't utilize data between data sources. Want to join data from a table in SQL Server or a file GDB into a destination table in Oracle? No can do, because the SQL engine sitting inside your Oracle server doesn't know your Arc session is also being a client to a SQL Server database.

When you have a real database and SQL engine inside your GIS, then the above limitations don't apply: you get full SQL with hundreds of formats and data sources, you can write a single query text that will work against all of those data sources, and you can mix data in the same query from different data sources.

Just saying, sure, PostgreSQL is a cool DBMS. But GIS is a lot better when you get the power of full SQL for everything and not just when connected to something else that has it.

See the Real SQL page.

artlembo


3,400 post(s)
#28-Apr-21 17:23

yes, for sure, Manifold gives lots of flexibility in that, as far back as 4 years ago.

But even still, as an old lady told me at a nursing home: we do the best we can with what we've got left. So, it is still possible to issue some SQL commands using the Database Manager in QGIS. Sadly, ArcGIS does not have this kind of functionality (and, I've pressed them on it for almost 20 years).

Dimitri


7,413 post(s)
#29-Apr-21 08:16

The virtual layer seems to use spatialite, but because it isn't actually in a SQLite database, it is horrendously slow.

Ah, sorry I missed the above. To clarify, it's the opposite: the reason it is horrendously slow is because it is in an SQLite database.

Q's calling those layers "virtual" layers is something of a head fake. They're not at all "virtual", in the sense of how people normally use the term, that is, being memory resident in some temporary data structure within QGIS's own internal architecture.

Instead, Q is creating an SQLite file database, a real, tangible file, and loading it with data so the SQLite engine that is built into that file database can work on it. That's why it is so slow. It has all the overhead of creating a new file database, loading it with data, calling the external engine to work on it, and then reporting the result as a layer in Q.

You could do exactly the same thing by simply taking the data of interest (say, what's in a GDB), and writing it out to a GPKG file (which is just an SQLite file database), and then connecting to it with Q and issuing SQL commands within that GPKG file. In fact, you could do more because then at least within that file database you'd have full SQL as offered by SQLite. It's an SQL subset, but it's a relatively big and useful subset.

I think that's why there's relatively less use of "virtual layers" than simply connecting directly to data stored in SQLite file databases. It's nice to have a view for sure, and it's cool that it's quick and easy to do that in Q, but for just the slightest extra effort and dropping the "virtual" pretense you can work directly with SQLite and get more capability.

tjhb
10,094 post(s)
#29-Apr-21 08:37

So that meaning of "virtual" is the complete opposite of the usual meaning?

The usual meaning is about using pointers, wherever pointers will do. (The so-called fundamental theorem.)

This meaning is instead about making deep copies? That is not very virtual at all!

Dimitri


7,413 post(s)
#28-Apr-21 09:04

I think the issue is that you can use ArcGIS to stand up a true distributed server, or, just use it out of the box, and it will take advantage of whatever cores you have on your computer.

I think what's going on is the same toolset is available either locally as a geoprocessing tool, just one that happens to be listed in the GeoAnalytics server section, or, you can also run that same tool in a GeoAnalytics server that you've stood up.

I make that distinction because ArcGIS, generally, has very poor parallelization. So running a job locally on just a few threads isn't likely to give you the performance that running a genuine, fully stood up in the cloud, GeoAnalytics server is likely to provide.

That's the general statement, of course. There's always the possibility that Arc does a particular parallelization exceptionally well and generates really good performance on the desktop. I haven't seen that yet compared to well-optimized parallelization, but it could well be out there.

I'm curious about GeoAnalytics server for two reasons: hooking up a distributed calculation across machines is more sophisticated than just running multiple threads in one machine, so I'd expect ESRI has applied more effort there, and I'm curious to see how that compares to the experience running Arc local.

I'm also curious since Manifold eventually will be providing fully automatic distributed processing across machines, and it's interesting to see how what ESRI is doing now with GeoAnalytics server compares to that.

Manifold's approach will be a lot simpler, maybe as simple as just installing Viewer on any machine you want to use as a computing node and then when you launch something in a desktop Manifold session it will automatically find any computing nodes available to you and just automatically dispatch, as if the whole thing was running local. There are obviously many moving parts to that which must be worked out, but there's no reason it could not be done more simply than how GeoAnalytics server is set up.

it will take advantage of whatever cores you have on your computer.

Not exactly. When Arc's parallelized tools (they have about 80) are run on a desktop computer they usually top out at between 4 and 9 threads, say, two to five cores with hyperthreading, even when you have the parallel processing factor set to 100%. If you run them on a 24 core Threadripper where Manifold will use all 48 threads, my experience has been that Arc will use only a few of those threads. Lots of the comparison videos make note of that.

Now, it's true most of my experience has been running raster computations on Arc. But those are far easier to parallelize than vector computations, so I'd be skeptical that Arc could use more cores in vector computations than it uses with rasters.

artlembo


3,400 post(s)
#28-Apr-21 13:33

I think what's going on is the same toolset is available either locally as a geoprocessing tool, just one that happens to be listed in the GeoAnalytics server section, or, you can also run that same tool in a GeoAnalytics server that you've stood up.

No, it's the actual GeoAnalytics tool that is being run, and it is separate and distinct from the Spatial Analyst tool. I know this because:

  • the tool itself says GeoAnalytics Sever as opposed to Spatial Analyst
  • the task manager shows all of the cores lit up and running (and completes the job in 12 minutes), whereas the Spatial Analyst version does not (and never completes the job)

I haven't seen that yet compared to well-optimized parallelization, but it could well be out there.

well, I've seen it. It is theSummarize Within tool, the first one I tried, and it completes a job in 12 minutes. I'm assuming that 9 is fairly happy with its optimized parallelization, and the ArcGIS task completes faster. I haven't tested other functions yet. Maybe this summer.

If you run them on a 24 core Threadripper where Manifold will use all 48 threads, my experience has been that Arc will use only a few of those threads.

I don't have a computer like that to test it on, so I can only state that Arc fills all 8 cores on my computer. Now, if Manifold would like to make a donation to my University of a beast of a computer so I can test these things out, I'd be all for it. Summer is coming, and I have lots of time on my hands!!

Now, it's true most of my experience has been running raster computations on Arc. But those are far easier to parallelize than vector computations

again all the more reason to donate a computer. Father's day is coming up, and I'm getting tired of ties and golf balls. This would really hit the spot.

Manifold's approach will be a lot simpler, maybe as simple as just installing Viewer on any machine....

I love this idea. As I said, the Join command is really easy for people who don't want to write SQL. This could be a killer app at that point.

on another note, I think the next frontier for 9 might be ManifoldOnline. Sort of an answer to ArcOnline. Some kind of widget that allows you to drop a .map file on a server, and then browsers can view the contents of the .map, and even perform some basic analytical functions. Or, maybe export the .map to another file format that can be viewed online. ArcOnline is very slick. We see novices publishing maps left and right with ArcOnline really easily, and they look quite good.

Sort of like Manifold IMS, but even simpler to use with a standard gui that a user can navigate. QGIS sort of does this with a viewer by dumping everything out as a json file. It works from a viewing standpoint, but there isn't any functionality. But again, I think ESRI has done a very good job with ArcOnline, it would be interesting to see Manifold poke around in that same environment.

Dimitri


7,413 post(s)
#04-May-21 14:14

Some kind of widget that allows you to drop a .map file on a server, and then browsers can view the contents of the .map, and even perform some basic analytical functions.

Sorry I missed that a few days ago. The above will happen when Manifold adds IMS functionality to 9.

The idea is to build IMS into Manifold, including the web server. 8 has a nice idea with IMS, but to run it you need IIS or Apache on the serving machine, with all the configuration/admin/licensing (for IIS) issues that entails.

The 9 approach would be simpler and much easier. Whenever you save a .map file, the .map file itself is all that's necessary for a default web site. You wouldn't even have to think about "how do I publish this as a web site?" Just save the .map file the way you always do.

Put a free Viewer license on whatever machine you want to host your web site, put the .map file on the same machine, and, automatically, you have a GIS-enabled web site. No need for IIS or Apache. Because Viewer is free, there are no licensing issues. Anybody with a static IP or a hosted web site where the provider allows executables could have a GIS-enabled web site.

Add a few options in a Web menu (maybe part of the Layout system, as a Web Layout, or a collection of Web templates?) and you could save that .map (potentially) with web UI options that are more sophisticated than the defaults..

I don't know how that should work for functionality beyond viewing, as that's more a business decision than a technical one. I guess at a minimum if you were running full Manifold locally you could connect to the web page URL with full functionality beyond just viewing the Viewer-hosted website.

I guess also at a minimum, even simply with Viewer you get a huge amount of functionality, like full SQL and a few hundred analytic functions, that would automatically be available to such a viewer-hosted website, for those who wanted to pick or otherwise configure a UI template with more than default functionality.

Beyond that, it's not clear where the line should be drawn between free/read-only and paid/full functionality. But that's not the first and biggest question. Almost all of the various web-based maps are view-only, probably 99% of them, so just the free-Viewer-as-IMS-server solution as a super easy "save your project and you just published it to the web" solution covers that end of it.

artlembo


3,400 post(s)
#04-May-21 14:51

thanks. The issue is, will visitors to the site need to download a .exe to view the data? Having to download a .exe will cut down on uptake. I have some ideas on how to add the functionality, but will hold off for now, as I'm busy with some things. I may send an email to sales outlining things.

Dimitri


7,413 post(s)
#04-May-21 19:03

The issue is, will visitors to the site need to download a .exe to view the data?

No, not at all. It will just be a plain web site viewable in any browser, same as georeference.org, or Bing or Google maps.

Now, that's not to say there couldn't in addition to the plain HTML of the web site also be some optional web services provided that might be exploited by an intelligent client. There is already a lot of that going on in GIS web servers, what with the alphabet soup of server protocols, such as WMS, WFS, ArcGIS REST and so on.

Besides the obvious cases like the above, there are many scenarios where such optional connections would be useful, for example, remote multi-user editing using Release 9 as a super-smart client in field data collection applications.

Just speculating, given a web server built into every copy of 9 or Viewer, I suppose you could have a web-based field data collection application that you could connect to using a browser on your phone or tablet when you have internet, and when you don't have internet you could run exactly the same thing by launching 9 locally as a web server, to which your browser connects to show exactly the same interface. When you get back into Internet connectivity, the app running on your gadget connects to the website and batch updates with the data you've collected while out of Internet range.

The nice thing about having a web server built into every 9 and Viewer license is that it allows creating web-based apps that run on a server somewhere, or run on the Windows tablet/notebook you're holding in your hand, in both cases using the same very widely known standards.

artlembo


3,400 post(s)
#04-May-21 19:40

yes, and this begins to play in the same territory of ArcOnline, Survey123, etc.

One could build a .map file with the data in place, along with the constraints or triggers, I suppose. With different Map components, users could choose which one they want to see (i.e. the landuse map with buildings on top; the address points with the centerlines, etc.). This would enable an organization to create their own data repository of all the layers they maintain.

tjhb
10,094 post(s)
#04-May-21 21:41

In this context--whether we mean Manifold data served over plain html, or remote data accessed via an intelligent client (which itself could be more or less closely related to Viewer, who knows...)--I would like to see a concept like the following.

Between Manifold and Viewer, we currently have differential capabilities, effectively governed by Manifold itself through an optional sale and purchase agreement.

I would like to see something analogous within .map files--whether serving html, or showing data either remotely or locally--and this under author control.

That is, in a word, permissions, applying at the project, data source, component, and layer levels. (I mean something like the usual range of permissions: show, read, read-write, execute, change properties, export...)

Among other things, this would (a) facilitate distribution of data via Manifold, since an author could adjust client access according to trust, pricing and client training level, and (b) help to ensure data validity through enforcement of known version and state.

Permissions at the project level might govern the range of controls exposed to and usable by the user. This would be useful in an html context, of course, but also in native format if it allowed presentation of a custom, simplified, interface.

(It could even control parsimony/verbosity in the GUI to a useful degree. Things like buttons, tool tips. Discussed recently.)

The main point is allowing the project author to set product capabilities, on a project-by-project basis, similar to how Manifold itself controls product capabilities between the main paid product and the free Viewer (but moreso).

Forest
625 post(s)
#10-May-21 04:46

Bit of a late reply but I have another reason why I use Manifold, other than the donut. Manifold fixes most geometry problems on the fly so can process large datasets that have errors. I think errors become more of an issue with large datasets, particularly those that are compilations. Some other GIS systems stop processing when they hit errors like polygons with self-intersects. Manifold gets the job done for me in these cases.

pslinder1
228 post(s)
#10-May-21 15:33

This is an extremely good idea!

tjhb
10,094 post(s)
#11-May-21 00:31

Thanks for reading and saying so. I will make it into a proper feature suggestion then and submit it. (A bit buried here!)

adamw


10,447 post(s)
#22-May-21 13:05

I will just quickly say that we thought about adding some kind of permissions for data stored in a MAP file and have a pretty good idea on how they could work. This is something for the future.

There is one thing though that you likely are after that we don't have a good solution for: access to the code of queries / scripts. If you want the user to be able to run the query / script, but not be able to see the code, that seems really hard to do: the protection cannot be on the level of the storage, because services that run the query / script are higher, so the storage should allow higher-level services to access the text of the query / script, so the user can retrieve that text from the storage using his own query / script. We could add some UI restrictions, obviously, but all this just seems very brittle.

But restricting access to records / possibly metadata for specific components is completely doable and the protection could be pretty solid.

tjhb
10,094 post(s)
#22-May-21 20:49

Thanks.

Regarding protecting code, I think I follow (I hope so). Queries and scripts are, in a sense, just text data, but their contents must normally be accessible (qua text data) in order to be executed.

I imagine that static compilation might be possible for some queries, and that in principle these could be stored in a less-than-readable form (some imagined Manifold IL); but probably not for all queries, since (out of my depth here) I think that not only dynamic execution but sometimes also dynamic compilation, can be of the essence, especially because Manifold/Radian SQL is so ecumenical.

.NET scripts can be compiled to DLL (i.e. to Microsoft IL). Decompilation is possible but usually more inconvenient than is worthwhile. I expect this would be easier for inline SQL expressions and commands, exactly because these are just text, to be passed from the .NET context to the Manifold compiler at runtime, but still the decompilation step imposes some useful barrier, even if it is only a legal or moral one.

In the same way, I think UI restrictions, mainly on being able to directly inspect code text (stored as a component property) could still be worthwhile. You explain why that would be leaky: an only slightly determined user could use new code to read any existing code, even if hidden, since the system architecture depends on that possibility. On the other hand, this would be somewhat analogous to .NET decompilation (if not as difficult), in the sense that it is at least something that could be contractually forbidden. (Whereas clearly it would not be practical to forbid the inspection of code directly visible as text.)

tjhb
10,094 post(s)
#06-Apr-21 21:05

Both parts of

(1) the new function called TileMaskRange takes a tile and a range of values..., checks for each pixel whether its value falls into the range, then turns that pixel invisible depending on the result..., (2) the new argument to TileToValues skips reporting invisible pixels.

are going to be fantastically useful. Every day, brilliant.

As a general question, how much slower would it be to use explicit string arguments (which can be visibly meaningful) rather than Boolean flags (which require reference to the manual)?

I would often prefer the former if they were not significantly slower.

(At school--referencing the other great recent thread--we used not punch cards but the kind you could block out with a graphite pencil. Face to face with Turing, a great way to learn as Ron said. Just not sure we still always need Booleans as args.)

adamw


10,447 post(s)
#07-Apr-21 10:03

Agree booleans aren't great, they are hard to distinguish from each other and it is hard to remember what TRUE means for each.

String parameters would be slow, however. We consider packing parameters into a JSON for big functions like Kriging, but doing this for small functions is too expensive.

There are two other options:

(1) Replace boolean flags with named integer constants. If there are multiple flags, combine them together using BITOR. Eg: GeomClip(geom, clip, GeomClipInner, 0). Pros: work right now, adding a new flag is better than adding a new parameter as the function prototype does not change. Cons: BITOR might feel too technical for SQL, there are tons of named constants, can use constants for function A to call function B.

(2) Extend SQL to allow calls with named parameters. Eg: TileMake(cx = 100, cy = 100, value = 5). Pros: all parameters can be annotated well, no restrictions on function design, works for user-defined functions. Cons: not traditional SQL, changing the name of a parameter breaks callers, parameter names become unlocalizable.

Not sure about any of them. (I mean, 2 is tempting, but seems too brittle with parameter names having to stay the same. No idea how this would work out in practice, could be bad.)

tjhb
10,094 post(s)
#07-Apr-21 13:25

Thanks for explaining that!

Not really worth improving on the current situation then.

It's not broken after all, just requires thought and memory and, well, checking. None of that is bad.

Thanks again.

Dimitri


7,413 post(s)
#07-Apr-21 14:04

Agree booleans aren't great,

Respectfully disagree. The best thing about booleans is that even if you are wrong, you're only off by a bit. :-)

drtees
203 post(s)
#19-May-21 01:03

Best way to create a boolean field that makes sense later on is to name it as a question. I use boolean fields for things like geo-referenced photo locations I want to show on a map out of the dozens of photos within a particular project.

adamw


10,447 post(s)
#22-May-21 13:13

(As a personal preference, names like 'occupied' seem best. 'is_occupied' could work as well. But please no 'occupied?' - we saw this several times, in government data no less, and the question mark at the end forces you to quote the name of the field all the time.)

artlembo


3,400 post(s)
#19-Apr-21 14:49

just ran the new function you listed here:

50 minutes. That's down from 87m.

That's quite fast. And, I don't think there is another product out there that can actually accomplish this task with really large data. I have yet to find anything (GDAL included) that will allow a filtering of the data (in other words, all these other functions have to bring in everything, including the 0's). You should really tout this more.

Also, a slight change in my process. The raster are not 1's and 0's. They actually have a meaningful number (they store the number of square meters of mangrove in each pixel - 30m worldwide raster!!). So, I had to filer for values between 1 and 900. This function made it really easy. So, the tool is adaptable to any case where you need to filter points from a raster meeting a specific criteria.

Dimitri


7,413 post(s)
#20-Apr-21 08:03

I have yet to find anything (GDAL included) that will allow a filtering of the data (in other words, all these other functions have to bring in everything, including the 0's). You should really tout this more.

Don't quite follow what you mean about the filtering... could you expand a bit on the above?

tjhb
10,094 post(s)
#20-Apr-21 08:10

I think Art means that the new TileMaskRange function, and the extension to TileToValues* to exclude masked pixels, have no equivalent anywhere else.

danb

2,064 post(s)
#08-Sep-21 01:16

I just wanted to briefly acknowledge the dedication and hard work of the Manifold team. On an off I have been tinkering with a process to produce indicative coastal inundation layers for our hazards team using coastal LiDAR to produce a series of ‘flood’ extents at a series of inundation heights above mean sea level.

https://waikatoregion.maps.arcgis.com/apps/MapSeries/index.html?appid=f2b48398f93146e8a5cf0aa3fddce92c

> Coastal Inundation tab

I started off with a script from one of our ArcMap users which processed the sample dataset (~19500 x 35500 px ) and produced the 30 inundation layers in ~60 hours. Once run, the DEM often required fixing and the process rerunning due to ‘leaks’ resulting from pixel generalisation along river embankments. Anyway, following the addition of the distance toolset, I ported the process to Manifold 9 which immediately slashed the processing time to around 45 minutes.

Following on from this thread, the TileMaskRange function was introduced. I had earmarked this to try as an alternative in the inundation script and with the recent lockdown in NZ have finally found the time to incorporate it.

This function and some other simplifications available in recent builds have not only made the script much simpler, but also brought the total runtime using the same data and methodology as the ArcPy original down to a much more respectable 15 minutes (x240 speedup ).

Thanks Manifold. Nice one

The M9 toolset also allowed me to put together another project which identifies potential leaks before running the inundation tool so no more need to rerun the process.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

danb

2,064 post(s)
#02-Apr-21 05:30

In general, our focus is on making 9 a much better fit for your "Smaller data" scenarios. That's what we are concentrating on currently

I am really glad to hear this is the case Adam. I love the power of M9 for carving up large datasets, but I would estimate that I probably have an 80:20 split between smaller tasks and those involving large datasets with millions of geometries.

While I appreciate that you are likely having to design for for potentially massive datasets, it would be nice if we could gain some of the fluidity of M8 workflows for those more mundane GIS tasks or those where there is just a bit more data than M8, Arc or Q would normally be comfortable with. For me it is mostly about getting familiar with my data as a workflow develops, rapid filtering selection, looking for things that don't look quite right and of course ViewBots which I seem to mention quite a bit


Landsystems Ltd ... Know your land | www.landsystems.co.nz

danb

2,064 post(s)
#02-Apr-21 05:21

You can see my logic for subdividing the data here. After completing the join, I knew how many points were in each subdivided polygon, and simply did a sum and grouped by the monkey polygon id.

I use this strategy quite regularly decomposing complex polygons to convex parts. In fact the attached screenshot shows an example from just yesterday where I am trying to build the equivalent of ArcMap's Eliminate function. I need to see what class the polygons are adjacent to the small polygons to be eliminated (purple). If decompose the larger polygons to convex parts first the adjacency test is much much faster.

Attachments:
Clipboard-1.png


Landsystems Ltd ... Know your land | www.landsystems.co.nz

Dimitri


7,413 post(s)
#02-Apr-21 14:31

The biggest Con with the help manual is searching for what you are looking for. It's terrible in my opinion. I think a .chm provides so much greater flexibility. I would hope they could produce something like that. I truly hate the search function on the browser and the index tab.

100% agree. The advice to use Google was good, but inconvenient.

I'm super happy to see that as of today the user manual for 9 has an embedded Google search box at the top of every page and the index/search tabs are gone. The search box searches only the user manual for 9. You can use all Google search syntax in the search box.

Depending on the browser that you're using, make sure to clear cached images and files, so your browser won't try to use old versions of the table of contents frame, etc. In the new Bing browser (a really super browser), that's in their Settings - Clear browsing data sub page.

Google's very good, but the search box will pick up only what is crawled by Google, so it might be a few days before it picks up very new changes. It also depends, of course, on having an Internet connection. But I think that's a fair tradeoff for better quality search.

artlembo


3,400 post(s)
#02-Apr-21 15:13

yes, you are right - as of today, the search bar does work much better. Could've used that yesterday

Let's give this another week or so, and see how well it works. This could be sufficient.

BTW, was that change always being planned, or was it in response to what was mentioned earlier?

Dimitri


7,413 post(s)
#02-Apr-21 15:39

That was something always planned, but moved up in priority based on community feedback. :-)

joebocop
514 post(s)
#02-Apr-21 19:38

Do you mean that sufficient Suggestions were received, following the instructions, or that company representatives reading the forum felt compelled?

hphillips31 post(s)
#02-Apr-21 16:00

It took a couple tries using the Google Search box in the online manual to realize one has to scroll down past all the Ad results (a screenful) to get into the search results relevant to Manifold

adamw


10,447 post(s)
#02-Apr-21 16:04

What were the search terms? The number of ads should be the same as if the search was done from the main Google site.

dchall8
1,008 post(s)
#02-Apr-21 18:13

I searched for layouts and got 4 ads above the Manifold returns.

hphillips31 post(s)
#02-Apr-21 18:20

'Join', 'Topology' A direct Google search from Google gives me definitions at the top, not ads like the search from the Manifold online help. But once past the ads, the searches are all totally Manifold relevant and useful through the Manifold Help entry point.

Attachments:
join.png
topology.png

Dimitri


7,413 post(s)
#02-Apr-21 18:36

Interesting.... I just tried 'Join' and didn't get any ads at all. Likewise for 'topology' or layouts.

The algorithms Google uses to show you ads are hard to predict. It depends on who they think you are, where they think you are, what the time of day is, your personal browsing history, what you've clicked on in the past, what you've watched on TV (if you have a Google-mediated TV setup), what you've watched on YouTube and many other factors, like what you buy online.

Earlier today I saw more ads, but now I can't get them to show at all, even trying search terms you'd think would trigger ads, like

lose weight (no quotes), or real estate or vacation or chocolate.

Just out of curiosity, I Googled

How do I remove ads from Google search results?

... and at the top of the results page was...

About 695,000,000 results (0.71 seconds)

hphillips31 post(s)
#02-Apr-21 19:14

No ads now for searches on join, topology or layouts, just Manifold results when searching from Manifold online Help. Hopefully it will stay that way!

drtees
203 post(s)
#02-Apr-21 23:01

I recently ran across a similar situation. I have a LiDAR data set containing 2,985,866,401 pixels and I needed to create a basin for a wetland using the Watershed command (yes, this is M8). M8 loaded up the file with no issues, but I know from experience that the data contain a lot of sinks. I use the fillsinks transform to fix that problem first. Once the sinks are filled, M8 is pretty speedy at creating the watersheds and streams. However, most of my experience is with a small subset of a larger LiDAR file. Why do an entire file when you only need something that is perhaps a half mile a side.

Back to the LiDAR file; I had M8 start the process of filling the sinks and it went to work. After 136 hours, it was only 19% done and probably would be crunching the data for another couple of weeks (I needed to pause M8 a couple of times).

While M8 was working, I was curious to see what M9 could do with the same LiDAR dataset. In the amount of time (minus a weekend) that I had M8 running, M9 filled the sinks and created several watershed layers, taking no more than one and three quarter hours per run. Most of those runs were to try expanding the minimum flow values to reduce the number of polygons generated. Basically M9 did several iterations of watershed areas and lines while M8 was still tying its shoe laces. I will be exporting the data back to M8 because the presentation tools are much better than the current version of M9.

I am looking forward to using M9's muscle on future projects. I am also thinking that I will need to upgrade my desktop computer. Two Xeon processors with two logical cores each appeared to be straining my computer resources. My final experiment will be to run M9 on my Mac (six cores) using Parallels to run Win10. Parallels will only take four of the cores. I also stitch together large numbers of aerial photos taken from our drone and learned that my Mac could complete that job long before my office desktop could.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.