Subscribe to this thread
Home - General / All posts - Select sum of top n records grouped by
Mike Pelletier

2,122 post(s)
#22-Apr-14 00:26

Could someone please assist me with how to select for each road segment the sum of the top highest values in line objects touching it? Attached is a map file with sample data. My job is to create a consistent way to measure the rural feeling along highways. I'm doing that by creating tie lines from address points to the adjacent highway via the shortest line (using one of Tim's queries). I then apply a weighting factor to homes closer to the highway (I made up these weight factors - no science here). Then I break the road into 100' segments and sum the adjusted lengths of the tie lines for each road segment. Here is where it would be good to limit the sum to the top n tie lines since after a point additional homes have little impact. Finally, I average the road segments to smooth out the values along the road. The idea is if it can be measured than folks will value it more. Any thoughts on the overall concept would be great too. Appreciate your time.

Attachments:
t.map

adamw


10,447 post(s)
#22-Apr-14 07:44

The straightforward way of doing this failed due to a limitation of the query engine in Manifold 8, so here's a roundabout way.

Say, you want to sum top 3 highest values. Then, create 1+3+1 queries, as follows:

Q - for each road segment, find tie lines joining into it, to get geometry out of the way:

--SQL

SELECT R.ID RID, T.ID TID, T.[Adjusted Length] L

FROM [Rural] R INNER JOIN [Tie lines 5280'] T ON Touches(R.ID, T.ID)

Q1 - for each road segment, find highest value for a tie line joining into it:

--SQL

SELECT R.ID,

 (SELECT TOP 1 TID T1, L L1 FROM Q WHERE R.ID=Q.RID ORDER BY L DESC)

FROM [Rural] R

Q2 - for each road segment, find 2 highest values for tie lines joining into it:

--SQL

SELECT R.ID, R.T1, R.L1,

 (SELECT TOP 1 TID T2, L L2 FROM Q WHERE R.ID=Q.RID AND R.T1<>Q.TID ORDER BY L DESC)

FROM [Q1] R

Q3 - for each road segment, find 3 highest values for tie lines joining into it:

--SQL

SELECT R.ID, R.T1, R.L1, R.T2, R.L2,

 (SELECT TOP 1 TID T3, L L3 FROM Q WHERE R.ID=Q.RID AND R.T1<>Q.TID AND R.T2<>Q.TID ORDER BY L DESC)

FROM [Q2] R

QS - sum highest values:

--SQL

SELECT R.ID, R.L1+R.L2+R.L3 L FROM [Q3] R

Hope this helps.

I guess you can do everything after Q easily with a script, without losing too much performance. The benefit would be that you'd only need one script component no matter how many highest values you'd like to sum.

Mike Pelletier

2,122 post(s)
#22-Apr-14 16:13

Thanks a lot for this Adam. Although I'd prefer no query engine limitation, it's a bit gratifying that after banging away on this for quite awhile, the solution wasn't simple.

When googling for an answer on this, I ran across a post that someone uses this question as a test for employment (non-Manifold SQL engines). The poster said only 5% or so can get it even with no time limit. You passed!

The script option sounds appealing since I was thinking of n = 20. My scripting skills are rather rusty so if there is a quick bit of code you could post that would get me going it would be much appreciated.

adamw


10,447 post(s)
#22-Apr-14 16:52

Here's a slightly modified query Q:

--SQL

SELECT R.ID RID, T.ID TID, T.[Adjusted Length] L

FROM [Rural] R INNER JOIN [Tie lines 5280'] T ON Touches(R.ID, T.ID)

ORDER BY RID, L DESC

And a script:

'VBScript

Function CreateResultTable

  Set columnSet = Application.NewColumnSet

  Set column = columnSet.NewColumn

  column.Name = "RID"

  column.Type = ColumnTypeInt32

  columnSet.Add column

  Set column = columnSet.NewColumn

  column.Name = "SumL"

  column.Type = ColumnTypeInt32

  columnSet.Add column

  Set CreateResultTable = Document.NewTable("Result", columnSet)

End Function

 

Sub AddResultRecord(recSet, rid, sumL)

  Set rec = recSet.NewRecord

  rec.Data("RID") = rid

  rec.Data("SumL") = sumL

  recSet.Add rec

End Sub

 

Sub Main

  Set tableOut = CreateResultTable

  Set recsOut = tableOut.RecordSet

  Set tableIn = Document.ComponentSet("Q").Table

  Set recsIn = tableIn.RecordSet

  lastCount = 0

  lastRID = 0

  lastL = 0

  For Each recIn In recsIn

    thisRID = CLng(recIn.Data("RID"))

    thisL = CLng(recIn.Data("L"))

    If lastCount > 0 And lastRID <> thisRID Then

      AddResultRecord recsOut, lastRID, lastL

      lastCount = 0

      lastL = 0

    End If

    lastRID = thisRID

    If lastCount < 3 Then

      lastL = lastL + thisL

    End If

    lastCount = lastCount + 1

  Next

  If lastCount > 0 Then

    AddResultRecord recsOut, lastRID, lastL

  End If

End Sub

The "n" constant is at line 39 (If lastCount < 3 Then) - the script uses 3.

Running the script creates a new table named "Result" (or "Result 2", "Result 3", etc, if the name is already taken, as usual).

Mike Pelletier

2,122 post(s)
#22-Apr-14 19:46

Wow! Thanks a ton Adam. That script works well and very fast on my data set. You have saved me lots of time and added a very nice addition to my method. It is somewhat surprising but the overall method for measuring "ruralness" along the highways does seem to work relatively well. It will be interesting to see how much traction or impact on public policy comes from this bit of GIS wizardry.

dale

630 post(s)
#23-Apr-14 01:03

Mike,

funny enough, we are working on something that parallels your work. Could you contact me off forum? (see my profile for email addy.)

<edit> off topic. We need a forum beverage escrow service to send posters a beverage or two. (Adam, Tim, Art...) It's pleasing to see the forum tempo picking up

Dale

Mike Pelletier

2,122 post(s)
#23-Apr-14 15:30

Will do Dale and yes these guys have more than earned it. Even though I mention the help received from all over the world, people still sometimes give me too much credit for my little projects. I think the forum shows how much more fun it is to use Manifold vs other GIS software despite the quite public face of Manifold.

tjhb
10,094 post(s)
#24-Apr-14 09:40

Adam and Mike,

Sorry to be adding to this thread so late--but I still think it's worthwhile so I'm suggesting a third way, in SQL alone.

It's the same basic method as used in the "extend lines to nearest lines" thread (see query [4 Closest end with nearest 3 segments] here), in this case using a parameter to find the TOP N rows.

Query 1 is almost identical to Adam's first query (Q).

-- Query 1

SELECT

    [D1].[ID] AS [Road segment ID],

    [D2].[ID] AS [Address line ID][D2].[Adjusted length]

FROM

    [Rural] AS [D1]

    LEFT JOIN

    [Tie lines 5280'] AS [D2]

    ON Touches([D1].[ID][D2].[ID]-- with e

;

Query 2 calls two instances of query 1 in a self join, matching rows on the left to each row on the right for the same road segment but having a longer tie line. (So in each matched pair we get the shorter line on the left.)

It can then group by road segment and use HAVING to leave only the rows where there are fewer than N matched rows on the right, i.e. where the match on the left is one of the N shortest lengths for the group (the road segment).

The count N is entered as a user parameter (with a default of 10).

The last wrap in query 2 groups again, summing the lengths for the filtered rows (the N shortest lengths, or N closest houses, on this road segment).

-- Query 2

PARAMETERS [Limit (shortest N lines)] LONG;

    -- Default is the 10 shortest lines

SELECT

    [Road segment ID],

    COUNT([Address line ID]AS [Address lines],

--    MAX([Rank]) AS [Check], -- NB always at least 1 (even if there are no incident lines)

    SUM(Coalesce([Adjusted length], 0)) AS [Total length]

FROM

    (SELECT

        [T1].[Road segment ID][T1].[Address line ID]

--        COUNT([T2].[Address line ID]) + 1 AS [Rank], -- one-based index

        [T1].[Adjusted length]

    FROM

        [Query 1] AS [T1] LEFT JOIN [Query 1] AS [T2]

        ON [T1].[Road segment ID] = [T2].[Road segment ID] 

            -- Same section of target line

        AND [T1].[Adjusted length] > [T2].[Adjusted length] 

            -- The left hand address line is longer than the right

            -- (giving shorter lines a lower count)

    GROUP BY 

        [T1].[Road segment ID]

        [T1].[Address line ID][T1].[Adjusted length]

    HAVING COUNT([T2].[Address line ID]) < Coalesce([Limit (shortest N lines)], 10)

    )

GROUP BY [Road segment ID]

;

Query 3 is an UPDATE version of query 2. (Like query 2 it just calls query 1.)

-- Query 3

PARAMETERS [Limit (shortest N lines)] LONG;

    -- Default is the 10 shortest lines

UPDATE

    (SELECT

        [D].[Sum][T].[Total length]

    FROM

        [Rural] AS [D]

        INNER JOIN

        (SELECT

            [Road segment ID],

            SUM(Coalesce([Adjusted length], 0)) AS [Total length]

        FROM

            (SELECT

                [T1].[Road segment ID]

                [T1].[Address line ID][T1].[Adjusted length]

            FROM

                [Query 1] AS [T1] LEFT JOIN [Query 1] AS [T2]

                ON [T1].[Road segment ID] = [T2].[Road segment ID] 

                    -- Same section of target line

                AND [T1].[Adjusted length] > [T2].[Adjusted length] 

                    -- The left hand address line is longer than the right

                    -- (giving shorter lines a lower count)

            GROUP BY

                [T1].[Road segment ID]

                [T1].[Address line ID][T1].[Adjusted length]

            HAVING COUNT([T2].[Address line ID]) < Coalesce([Limit (shortest N lines)], 10)

            )

        GROUP BY [Road segment ID]

        ) AS [T]

        ON [D].[ID] = [T].[Road segment ID]

    )

SET [Sum] = [Total length]

;

Attachments:
Query 1.txt
Query 2.txt
Query 3.txt

adamw


10,447 post(s)
#24-Apr-14 10:13

That's certainly worthwhile - it's much better than the first solution with multiple queries that I proposed, for example, and has an edge over the second solution in that the second solution involves scripts and this one doesn't (scripts are fine, but what if you don't know how to script, for example).

The technique of joining a query with a copy of itself with an ordering condition, then grouping and using COUNT to establish ranking is very useful (and generic), indeed.

As written, the ranking is not a strong one, in that it misbehaves if some of the [adjusted length]s are equal to each other - as a result, for example, if there is a road segment with 11+ equally long tie lines that join into it, the query will use all these tie lines instead of the first 10 - but that's easily fixed by altering the join condition.

tjhb
10,094 post(s)
#24-Apr-14 10:43

True, and thanks for spotting the bug Adam.

...

ON [T1].[Road segment ID] = [T2].[Road segment ID] 

    -- Same section of target line

AND CASE

    WHEN [T1].[Adjusted length] > [T2].[Adjusted length] 

        THEN TRUE

        -- The left hand address line is longer than the right

        -- (giving shorter lines a lower count)

    WHEN [T1].[Adjusted length] = [T2].[Adjusted length] 

        THEN [T1].[Address line ID] > [T2].[Address line ID]

        -- Tiebreaker for equal lengths

        -- (giving lower ID a lower count)

    ELSE FALSE

END

...

Also fixed in attachments.

Attachments:
Query 2.txt
Query 3.txt

Mike Pelletier

2,122 post(s)
#25-Apr-14 17:58

Very useful indeed Tim. Thanks for all your effort on this. I hadn't thought that ordering was possible in the join condition. Neat trick. Thanks too for providing it in an update query.

Unless someone has a better way, I'm going to improve score method to reflect that homes on both sides of road have more impact that on just one side. My thought is to add a column called "Side" and hand select tie lines along the road and mark them left or right. I can then add a where clause to Query 1 to sum just left and right seperatly and then combine the score with some factor such [left score] *1.5 + [right score] = total score.

tjhb
10,094 post(s)
#30-Apr-14 04:53

Mike,

If it's useful, it wouldn't be hard to adapt the code to recognise for each property (tie line) on which side of the road it falls, and then to sum values for left and right properties separately.

I know you can do this manually (which is not difficult either).

Tim

Mike Pelletier

2,122 post(s)
#02-May-14 20:17

Tim,

If you don't mind adapting the code that would be wonderful. Anything to automate the process would be very helpful since I will need to run this many times for various areas, playing with the formulas. If your game (realizing you've already gone above and beyond), I've attached a file where the steps and desires are laid out. Hopefully its clear but feel free to email if you'd like. Dale and I have talked a bit off forum about this as well.

Attachments:
t.map

tjhb
10,094 post(s)
#03-May-14 09:39

Thanks Mike. Looks good. Hope to have something tomorrow.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.