Subscribe to this thread
Home - General / All posts - Help in creating a simple SQL expression
drtees
204 post(s)
#22-Aug-23 21:34

I was learning SQL on my own during the past two months, but got sidetracked by family issues. I downloaded LiDAR data from the Washington State LiDAR Portal with the intent of creating a new, up-to-date contour image. The LiDAR file, however, appears to be incompletely processed. There are visible horizontal and vertical lines in the image instead of a relatively smooth-looking surface. Contours made from this file reflect those horizontal and vertical lines.

I decided to download the point cloud data in order to recreate the LiDAR surface. The point cloud data include all laser returns. What I would rather show is the last return, which should be (if I am not mistaken) the ground surface elevation. There are two fields in the point cloud table (named kc_1877_rp); return_number and number_of_returns. What I am trying to do is create a simple query that selects records where return_number = number_of_returns. So far, all of the permutations I have tried stemming from the training videos, Manifold 9 documentation, "SQL for Dummies", and ChatGPT end up with the error message "Cannot Parse Expression." It would seem simple enough.

One example I have tried is:

SELECT * from [kc_1877_rp] WHERE [return_number] = [number_of_returns]

I am obviously missing some step here and would appreciate any assistance in correcting the query.

Dimitri


7,449 post(s)
#23-Aug-23 06:12

Could you post the schema for the table / drawing you are using?

drtees
204 post(s)
#23-Aug-23 15:19

Here is the schema for the table.

Dimitri


7,449 post(s)
#23-Aug-23 18:03

I don't see anything obvious, so maybe it's a typo. Are you linking or importing the lidar file? If you linked it, could it be that you forgot that a reference to a table in an external data source has to have the full name, that is in [data source]::[table name] form?

Here's what I did (you can download the same file and duplicate what I did): from the Release 9 Product Downloads page I downloaded the pentagon_laz.zip map. That's the same one used in the LAS, LAZ topic example (which should be read). I then unzipped it and linked the pentagon.laz file into a project as a data source.

I chose View - New Command Window - SQL to launch a command window. I then dragged and dropped the pentagon table into the lower right pane of the query builder, so I could double-click the name of the table and the fields to enter them into the query text without typographic errors.

I wrote the query:

SELECT * FROM [pentagon]::[pentagon] 

WHERE [return_number] = [number_of_returns] 

AND [return_number] > 2;

And ran it. It ran fine, as you can see from the screen shot. I added the condition

AND [return_number] > 2

to reduce the size of the results, so running the query just to try it out would go faster.

By the way, if you just want the ground return, if your Lidar data is LAS 1.1 or later you should be able to just use the classification field, where a value of 2 means ground. Esri has a helpful page on that.

You could write a query like

SELECT * INTO [pentagon_ground] 

FROM [pentagon]::[pentagon] 

WHERE [classification] = 2;

to create a new table with just the ground points.

You could then create a drawing from that table and then interpolate an image. The image below follows the same kriging settings used in the example in the topic. I don't know who created the pentagon laz file, but the "ground" classification I suppose was an attempt to remove structures and anything else above what the ground level would be.

Attachments:
pentagon_lidar.png
pentagon_lidar_ground.png
pentagon_lidar_ground_kriging.png

drtees
204 post(s)
#23-Aug-23 18:54

I see where I might have made my error. I did not link to the data Rather, I imported it into the project map. I will try linking. I do recall looking at the first fifty rows of data, but recall only seeing zeros in the classification field. More stuff to try.

Thank you, Dimitri!

drtees
204 post(s)
#23-Aug-23 21:29

I linked the las file and ran your script. It worked! Thank you! I also noticed that there are classification flags associated with the data. I am going to use this field to select the ground surface return per the classification codes on the ESRI page.

drtees
204 post(s)
#23-Aug-23 21:41

The SQL script is working and has been for several minutes now. The file contains 134,700,000 records. Selecting on the Classification field might have been faster!

artlembo


3,410 post(s)
#23-Aug-23 22:51

Make sure you create an index on the field you want to perform the selection.

drtees
204 post(s)
#24-Aug-23 22:01

One cannot attach an index to a linked table, or at least that option was grayed out. I can attach an index to the las table if it is imported directly to Manifold 9. However, it does not appear to provide any speed benefits when selecting the requested records. Manifold 9 has been crunching away at pasting 69,184,531 records selected from the entire data set (137+million records) for 30 minutes now.

artlembo


3,410 post(s)
#24-Aug-23 23:03

Inserting or updating 70 miooion records on a linked table, and chugggingnaway for a while? That might not be too bad. Likely the API for the linked table is the bottleneck.

As an experiment, and to test some of your assumptions, why don’t you import the table and then try the quarry to see how fast it runs. The question will be, is it 10 times faster or 100 times faster.

Dimitri


7,449 post(s)
#25-Aug-23 06:34

That's probably worth a try but it may not help as there's a special point cloud index that is used when linking to las/laz but which has not yet been implemented for point clouds imported into the project. There's also the significant amount of time to import from las/laz.

Come to think of it, that's part of the issue with extracting data via a query from the las/laz file: it's no longer just a viewing situation, but the data has to be decompressed from las/laz and made available within the project, so I'd expect the system has to basically import it anyway, at least virtually into temp storage, so the real data is available to the query engine. So you may as well import the data and add relevant indexes, like Art recommends.

las/laz is a great format for interchange and archival storage and for specific uses dedicated to LiDAR, like viewing and like various transformations purpose-built to work with it (as in LAStools), but it's not a high performance format for general purpose GIS things like unrestricted SQL. So if you're going to do a lot of general purpose GIS on the point data set it's probably better to import from las/laz and into a faster data store.

There are two options for that: native Manifold storage, or use a general purpose DBMS that's very fast at handling points, like PostgreSQL.

Manifold's internal DBMS is tuned for general purpose GIS use where individual objects aren't just points but are often (even usually) lines or areas that consist of many coordinates. That's why 9 is often faster than general purpose DBMS packages like PostgreSQL for general purpose GIS use. (as seen in the "Rendering Shootout" videos in the videos page at https://manifold.net/info/videos_gallery.shtml )

But DBMS packages like PostgreSQL were created explicitly to handle very large numbers of records and transactions where each record is tiny. That's what points are, so they're fast at that. Well written DBMS packages that have had 40+ years of tuning have also become so fast that even with GIS data that has "fat" records and transactions they'll often run faster than Manifold's native data store once you get into the 100 to 200 million objects range.

In this case, I'd guess a data set of 137 million records consisting just of points and a handful of small attributes in each record to have faster data access when stored in PostgreSQL, especially when you rig up Postgres for parallel processing (not trivial, but doable). That's a special case that plays to Postgres's strengths, where it will shine.

If you're doing a lot of this sort of work, you could install a PostgreSQL/PostGIS database and load it up with the las/laz data and then do the queries in the server, linking to the results with 9. Of course, loading up a Postgres database with 137 million records from the las/laz file will be painfully slow, but it will be a one-time pain (kind of like it being a one-time pain to import into the Manifold project).

That obviously doesn't make sense for one-off projects, where it would be better to just bite the bullet, link the data into a 9 project and just run the query. Just like "a watched pot never boils", don't sit there and watch the thing crank away. Run the query when you're headed out the door to go home for the evening so when you come in the next day it will be done and you'll be thinking "oh, that was quick...". :-)

Hope that helps!

drtees
204 post(s)
#06-Sep-23 19:05

It is helpful. I have been "agitating" for some time now to get a SQL database installed on the company server, even one as costly as PostgreSQL! It has not been a high priority for the company to date. I am not certain I want to restrict PostgreSQL to just my computer since I am trying to get the office to use Manifold 9 more widely.

All that said, I have been using Manifold 9 to create LiDAR surfaces for other projects as well. I set up the project to run while I am doing other things, so the wait time is not that much of an issue. I have found that the surface Manifold 9 creates is sometimes better than the canned tiff files from the LiDAR repository, although I can't explain why (fewer weird artifacts such as my aforementioned grid pattern).

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.