Subscribe to this thread
Home - General / All posts - M9 Matching and grouping on a column of delimited values
danb

2,073 post(s)
#18-Sep-23 04:09

I have a problem which I have been trying to tackle with SQL, but I just can’t quite get my head round it. In the IDX column, I have a number of delimited identifiers. What I require to be able to do is a bit difficult to explain, but it essentially involves collapsing the IDX rows by checking if any of the identifiers occur in any other IDX rows.

For example, 117 and 118 don’t occur in any of the other IDX rows and so they stay unaltered. Looking at the remaining rows, for matches, 254 occurs in all rows so these would collapse into another row in the result. I don’t care necessarily about retaining the IDX values beyond this point, but in this example the result would be a table with two rows and the union of the geometries.

Hopefully this makes sense? and many thanks in advance for any pointers as it is beyond me J


Landsystems Ltd ... Know your land | www.landsystems.co.nz

tjhb
10,108 post(s)
#18-Sep-23 16:43

254 appears in 4 rows, but likewise, 238, 247 and 253 each appear in 3 rows.

How to account for this? Why not return a table with 5 rows—why only 2?

Why and how is a match among 4 rows to be preferred to one or more matches among a smaller subset of the same rows?

And then, how about overlapping sets of matches (rather than subsets)? For example if there is a 4-way match between rows A, B, C, D and a different 4-way match between rows A, B, F and P. Or a 5-way match among a different set of rows including B and D.

danb

2,073 post(s)
#18-Sep-23 20:58

You are of course correct about the cases that you point out Tim, and this is the nub of my conceptual problem. Thought a SQL effort (as far as I could tell) would produce 5 results (this is how I got to this table), the final "correct" result I am looking for is 2 records only and the union of all the geoms pertaining to these two records by virtue of 254 being common to the latter 4 records.

I think SQL is likely the wrong tool for the job, but I am much more comfortable with SQL than procedural script so thought I would exhaust this option first.

What I am trying to do here is cluster/group polygons according to their size and proximity to each other i.e., if polygons are within a maximum distance of each other and their attraction weights exceed a certain threshold they are merged to become one disjoint multipolygon.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

danb

2,073 post(s)
#21-Sep-23 03:38

For completeness, I think I have found my answer in Adam's script here: https://georeference.org/forum/t144606#144615

This is a great script and not the first time it has got me out of trouble. Thanks Adam


Landsystems Ltd ... Know your land | www.landsystems.co.nz

apo
181 post(s)
#22-Sep-23 06:45

If I get the real sense of the question the following query is doing the job,

you then just have to make an inner join on the FID and group by the GROUPID to have the geom merged

SELECT T1.FID, MIN(T2.FID) AS GROUPFID

FROM (

    SELECT [FID],  SPLIT CALL StringToTokens([IDX]'|')

    FROM [Table] 

AS T1

LEFT JOIN  (

    SELECT [FID],  SPLIT CALL StringToTokens([IDX]'|')

    FROM [Table] 

AS T2

ON T1.vALUE = T2.VALUE

GROUP BY T1.FID

This is not working if you want to merge lines in a cascading way, meaning that you want to merge 3|4 with 4|5 and with 5|6

danb

2,073 post(s)
#22-Sep-23 21:43

Many thanks Apo. I will dig into this once I return to work on Monday. Much appreciated


Landsystems Ltd ... Know your land | www.landsystems.co.nz

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.