The This program will never make it thread was getting a little too focused on a few specific issues, and while I think there is really good discussion in it, I wanted to open this new thread to address some of the issues I've been thinking about as it relates to 9, and where it fits in to people's solution. Hopefully, others can discuss how they are using 9, and how it fits within their solution set. First things first, what I call the donut hole I like 9. It's kinda cool to work with, but like many others have said, it is still a work in progress with some limitations. Also, the interface isn't as polished as I might like. Smaller data The reality is, if I have a dataset with 2 million vector features or less (basically any county GIS data), Manifold 8 is still awesome - albeit, dated when it comes to reading in certain formats. QGIS is also very easy to work with if you have 2 million features. ArcGIS Pro has also repackaged things very nicely. All 3 of these products cut through the data really, really, nicely. So, personally, I won't likely use 9 to work with data sets under 2M, as there are just more mature products that get the job done. This is where I think most people live. Insanely large data Someone once told meif it fits on your computer, then it's not big data. I tend to agree, but as I deal with GIS professionals as an educator, I like to tell them if it's more data than you've ever worked with, then it's big data to you. But, for argument sake, let's go with the first scenario, massive amounts of data. In reality, QGIS, ArcGIS Pro, and Postgres aren't going to cut it, and neither are any of the versions of Manifold. To deal with this, you likely need a server, with lots of clients. And, in many cases, you need to spin up something like Hadoop to make it work. I've been there, done that, and never want to go through that experience again. I'm no rocket scientist, but I'm no dummy, either. However, to stand up a Hadoop cluster took many days. But, it got the job done. In reality, with my 40 seat lab, I could probably just break the data up into 1/40th of its size, put it on each computer, and then run Postgres on it over a weekend (getting my exercise by running to each computer to hit the run button!). I don't see many people living in this ecosystem, and those that do, likely get paid a lot of money to spin up servers. Very large data - the donut hole Many people fall into this donut hole - they don't have the insane amount of data that requires Hadoop, but the data is way larger than a few million objects. This is where 9 really shines. I don't know the breaking point yet, but let me give you an idea of how I just used 9: I'm doing some work with the International Union for Conservation of Nature Red List of Threatened Species, and we are looking at worldwide monkey population and their relationship to mangroves. The mangrove data is 330 billion pixels in size (yes, billion). It is a binary data source, so it is rather ridiculous to store the data like that in a raster, but such as it is. It turns out, there are 61 million mangrove points that we can extract from the raster. The monkey polygons are not particularly a large data set, but the number of vertices are insane - oh, my goodness, it's huge! Really intricate, sinuous polygons. We needed to determine the number of mangroves in each of the monkey polygons, and we had to buffer each monkey polygon by different amounts. After working with the data for months, the researchers came to me, because they knew I sort of swim in this ecosystem. 9 was almost perfect for this. Almost, and hopefully, what comes next could be some good additions to the program. I'm going to be writing a formal paper on the methodology, but the following is what I did: 1. Convert the 330 billion pixel raster to 61 million polygons (9 struggled with this, but ArcGIS Pro was able to do it). 2. Convert the polygons to point centroids in 9. I didn't even need to write code for this, just used the GUI. 3. Used the Join dialog to count the points in the polygons (more on this below). 4. 17 minutes later, the process was done! My colleagues have been running their same command for 3 days. Getting the overlay to work Just like the other GIS products, 9 couldn't really run the process after 24 hours. So, I used a little trick in PostGIS called ST_Subdivide, and turned the 928 monkey polygons into 472,000 polygons with no more than 20 vertices. You can see my logic for subdividing the data here. After completing the join, I knew how many points were in each subdivided polygon, and simply did a sum and grouped by the monkey polygon id. That was it. I can't tell you how much data is actually there, but after everything was done, I had used up over 100GB on my computer (some data is duplicated in Postgres, and some in 9, and some in a geodatabase). But yes, a good amount of data that. So, 9 is a perfect tool for that donut hole, converting many days of processing to under 20 minutes. And, even turning many months of batting around ideas to about 3 days (I experimented with a few options, so that's what took me 3 days to get my head around this). In this case, 9 was the only way to blast through this data without going DEFCON V with Hadoop. So, for $95, you have a screaming software product to deal with lots of data and intricacies. Things I'd like to see Manifold add: 1. a really good raster to vector (point or polygon) tool to convert pixels values to points (the current tool could not get through all the data in any reasonable amount of time). 2. a ST_SubDivide command. This is really critical if you are going to work with big data. Sometimes you have to partition the data to give the software a chance. I'd add to that, the ability to create tessellations like hexagons because they are often used to partition a large empty space. One comment about the Help Manual: The biggest Con with the help manual is searching for what you are looking for. It's terrible in my opinion. I think a .chm provides so much greater flexibility. I would hope they could produce something like that. I truly hate the search function on the browser and the index tab. The biggest Pro with the help manual is the detailed information. I never appreciated it more than when I worked on this project. The examples are really great and detailed, and I had never used the Join function before, and taking only 10 minutes to read that section and the example of counting cities inside of states was all I needed. I'm a reasonably intelligent person, and the help manual treats you as such - it is written with a lot of detail so that if you've never used a function before, they will step you through it rather nicely. But, they do expect you to read it. And no, I didn't read hundreds of pages. But, I did read the pages I needed to, and took that aspect seriously (BTW, I'm the guy who when fixing a lawnmower doesn't read the directions, I jump right in - that won't work with 9, you really do have to read it, and all the info you need is right there). My apologies for such a long post, but I hadn't really seen much about how people are actually using 9 in the wild, and where it fits in. Hopefully other can elaborate on their own use-cases.
|