Subscribe to this thread
Home - General / All posts - Correcting line direction
atomek

422 post(s)
#18-Mar-15 16:12

Hi,

I have a Network of roads and a large number (450k) of Paths running over that network. I would like to update Network objects with information if the Paths running over them are going in the same direction (1), in the other direction (0) or in both directions (2). Attached is some dummy data with correct attributes for that. Any ideas appreciated.

Thank you,

Tom

Attachments:
line_direction.map

tjhb
10,094 post(s)
#19-Mar-15 03:48

Try

UPDATE

    (SELECT

        [D].[ID][D].[Direction],

        [T].[Path directions]

    FROM

        [Network] AS [D]

        LEFT JOIN

        (SELECT

            [ID],

    --        COUNT(DISTINCT [Segment direction]) AS [Check],

            CASE

                WHEN MAX([Segment direction]IS NULL

                    THEN -1 -- no match

                WHEN MIN([Segment direction]) = -2

                    THEN -2 -- some segment(s) with unknown direction

                WHEN COUNT(DISTINCT [Segment direction]) = 1

                    THEN FIRST([Segment direction]-- the same

                WHEN COUNT(DISTINCT [Segment direction]) = 2

                    THEN 2 -- mixed

                ELSE -3 -- should not arise

            END AS [Path directions]

        FROM

            (SELECT

                [D1].[ID],

                CASE

                    WHEN Coord([Network section], 0) = Coord([Path segment], 0)

                        THEN 1 -- same

                    WHEN Coord([Network section], 0) = Coord([Path segment], CoordCount([Path segment]) - 1)

                        THEN 0 -- opposite

                    ELSE -2 -- unknown

                END AS [Segment direction]

            FROM

                [Network] AS [D1]

                INNER JOIN

                (SELECT [ID][Path segment]

                FROM [Paths]

                SPLIT BY Branches(IntersectLine([ID][ID])) AS [Path segment]

                    -- IntersectLine() does not trigger normalization (or reversal)

                ) AS [D2]

                ON Contains([D1].[Geom (I)][D2].[Path segment]-- path segment within network line

            SPLIT BY Branches(IntersectLine([D1].[Geom (I)][D2].[Path segment])) AS [Network section]

            LEAVING Touches(Centroid([Network section]), [Path segment])

                -- the section of network within the path segment

            ) 

        GROUP BY [ID]

        ) AS [T]

        ON [D].[ID] = [T].[ID]

    )

SET [Direction] = [Path directions]

;

(I doubt this will be fast.)

Attachments:
Path directions within line.txt

atomek

422 post(s)
#19-Mar-15 15:25

Thank you! Works great with smaller samples so now I throw my 450k dataset and we'll see how it goes (probably will have to leave it running overnight)

tjhb
10,094 post(s)
#19-Mar-15 16:52

It could be sped up by making a (fairly mild) assumption. It could sample the overlain paths near the centre of each network line, rather than checking every overlain segment.

atomek

422 post(s)
#19-Mar-15 17:58

Good. I left the machine running in the office but it would be nice if you shared an updated/quicker version as, even though I sometimes think of myself as well versed with SQL, your query just blows my mind (too many operators which I normally never use).

tjhb
10,094 post(s)
#19-Mar-15 20:39

Try this version. It compares the direction of the middle segment of each line in [Network] (rounding towards the start) with the directions of overlaid lines in [Paths].

(The first version does the same, but for every segment of each line in [Network], aggregating the results. So if you hit a strangely formed line, with reversed segments, then the first version would be more robust, returning 2 for mixed directions.)

UPDATE

    (SELECT

        [E].[ID][E].[Direction],

        [V].[Path directions]

    FROM

        [Network] AS [E]

        LEFT JOIN

        (SELECT

            [ID],

--            COUNT(DISTINCT [Segment direction]) AS [Check],

            CASE

                WHEN MAX([Segment direction]IS NULL

                    THEN -1 -- no match

                WHEN MIN([Segment direction]) = -2

                    THEN -2 -- some segment(s) with unknown direction

                WHEN COUNT(DISTINCT [Segment direction]) = 1

                    THEN FIRST([Segment direction]-- the same

                WHEN COUNT(DISTINCT [Segment direction]) = 2

                    THEN 2 -- mixed

                ELSE -3 -- should not arise

            END AS [Path directions]

        FROM

            (SELECT

                [T].[ID],

                CASE

                    WHEN Coord([Network segment], 0) = Coord([Path segment], 0)

                        THEN 1 -- same

                    WHEN Coord([Network segment], 0) = Coord([Path segment], CoordCount([Path segment]) - 1)

                        THEN 0 -- opposite

                    ELSE -2 -- unknown

                END AS [Segment direction]

            FROM

                (SELECT

                    [ID],

                    NewLine(

                        Coord([ID], Floor(CoordCount([ID]) / 2) - 1), -- since one-based

                        Coord([ID], Floor(CoordCount([ID]) / 2))

                        ) AS [Network segment]

                FROM [Network]

                ) AS [T]

                LEFT JOIN

                [Paths] AS [D]

                ON Touches([D].[ID][T].[Network segment]-- with e

            SPLIT BY Branches(IntersectLine([D].[ID][T].[Network segment])) AS [Path segment]

                -- IntersectLine() does not trigger normalization (or reversal)

            LEAVING Touches([Path segment], Centroid([Network segment]))

                -- The section(s) of path within the network segment

                -- (more than one if the same path reverses on itself)

            ) AS [U]

        GROUP BY [ID]

        ) AS [V]

        ON [E].[ID] = [V].[ID]

    )

SET [Direction] = [Path directions]

;

If you get any -2 or -3 results, please tell me.

(If you get any -1 results, check you data.)

Attachments:
Path directions within line b (sample).txt

tjhb
10,094 post(s)
#19-Mar-15 20:55

(Annoying redundant alias removed.)

Attachments:
Path directions within line b (sample).txt

atomek

422 post(s)
#20-Mar-15 13:24

Thanks for that. Although dummy data worked fine, for some reason reality looks different.

At some point both of them stop with a message "Operation canceled". I took a small sample of real data and watched carefully as it was running and this time it finished successfully but did not give proper output. The map file I'm working with is attached. Some lines got updated, others have not, and there are -2 results.

Attachments:
GlasgowCBD_NetworkCorrentDirectionSample.map

tjhb
10,094 post(s)
#20-Mar-15 20:22

Cool, thanks for the data.

(1)

We need to write a default value for direction, for network lines not overlain with any paths (no match). This is provided in subquery table U, but I forgot to propagate it to the result. (There were no unmatched network lines in the sample data, but that's no excuse.)

Easily fixed, by changing

SET [Direction] = [Path directions]

to

SET [Direction] = Coalesce([Path directions], -1)

(I then set lines with direction -1 to show in light grey.)

That gets that distraction out of the way.

(2)

Now the cases of unknown line direction (-2). I tried the easiest thing first.

Orthogonalizing both [Network] and [Paths] at a resolution (step) of 0.5m "cures" this result: no more unknown directions. (0.1m resolved the issue for all but 3 network lines.) So the issue appears to be that in many cases vertexes in [Network] and [Paths] are very close, but not exactly the same.

I'll try to rework the query/queries to achieve the equivalent of orthogonalization without changing the data. (But there might be a large cost in speed.)

In the meantime you might have a workaround.

I notice you're using a location precision of 1m. If I change the precision to the default 0.000001m, then without orthogonalization, running the query results in no lines being matched (all results -1). On the other hand, again at 0.000001m precision, if we do orthogonalize at 0.5m, we get all relevant lines being matched and no unknown values (-2)—the same result as at 1m precision with 0.5m orthogonalization. So orthogonalization seems to be the key.

(3)

"Operation cancelled" is interesting. Sometimes this is caused by an attempt to UPDATE data for a specified row that does not exist (a coding bug). It's hard to see how that could arise here. I think sometimes it can also be caused by a lack of memory or some other internal overload. Might have to return to that problem later.

atomek

422 post(s)
#21-Mar-15 16:16

That did the trick. I will run it on Monday on complete data. Thank you

atomek

422 post(s)
#24-Mar-15 14:53

It does a good job and "Operation canceled" indeed might be a result of too little resources (RAM). It hits the limit, runs for some time like that and then throws the error. If is work on a subset-by-subset it seems fine, though a bit more laborious.

tjhb
10,094 post(s)
#24-Mar-15 20:15

I think we can improve on that, managing the memory issue by splitting the task in two. First a SELECT INTO query, then an UPDATE query using the result.

I'll post back soon.

tjhb
10,094 post(s)
#24-Mar-15 20:33

Try running query 1 then query 2.

We could subdivide query 1 further, writing a great heap of data to a fixed table (essentially, to disk) then doing the grouping in a separate query. That might help even more.

[Sorry, I messed up the attachments. Back soon.]

tjhb
10,094 post(s)
#24-Mar-15 20:43

OK so try running either Query 1 then Query 2...

Attachments:
Query 1.txt
Query 2.txt

tjhb
10,094 post(s)
#24-Mar-15 20:46

...or Query 1a, then 1b, then 2b.

Either of these combinations should allow RAM to be managed more efficiently than the single-bite query.

(I think I've got some way towards finding out why orthogonalization is necessary--and how it can be avoided.)

Attachments:
Query 1a.txt
Query 1b.txt
Query 2a.txt

atomek

422 post(s)
#25-Mar-15 12:44

Oh, thats nice. Working on a subset I see that the first (1a) query takes 99% time while remaining two run 'instantly'. When I thrown all my data at it I got 'Operation canceled' when the query (1a) was performing 'split by' what happened 1h30min after the query started (it was doing the 'join' w/o problems). It's not a memory issue as I can see there's still resources available. Can the 1a query be decomposed further and 'split by' operation isolated?

tjhb
10,094 post(s)
#25-Mar-15 15:31

You poor fellow! I feel your pain.

Yes it can be split further.

But are there any points in the network or paths drawing? Or any lines with zero length?

And what location precision are you running at? Are there any lines (in either drawing) shorter than that?

atomek

422 post(s)
#25-Mar-15 17:17

1) no points in neither drawing

2) no zero lengths

3) Im running at 1m precision

4) shortest line in Network is 1m length

Its a bit odd because when I took a larger chunk of 50k objects it threw that error but when I worked 20k after 20k and so on (I did 6 chunks like that so far) I surely passed through all of those 50k but with no errors this time

tjhb
10,094 post(s)
#25-Mar-15 17:36

Thanks. You checked your TEMP space during processing, as well as your RAM?

adamw


10,447 post(s)
#26-Mar-15 17:01

If you can reproduce the "operation canceled" (which is unexpected) on a subset of your data (I know, this isn't a given) and either post it here or drop a note on how to get it to to tech@, we could run the query and determine exactly what's going on.

atomek

422 post(s)
#30-Mar-15 10:37

Hi Adam,

Here's a link to my data

https://share.sustrans.org.uk/share/Handlers/AnonymousDownload.ashx?file=3f8d1be3

adamw


10,447 post(s)
#06-Apr-15 15:30

OK.

After many retries and several adjustments to the code / testing environment (the process was taking too long to get to the error under debugger), I can say that (a) the "Operation canceled" error message is fake, that's not the real error, but (b) the real error is legit - the internal table created by SPLIT BY runs into a hard limit of 4 GB of data in one of the internal memory structures for that table (too many records), which can't be overcame easily.

I know this doesn't help much, but at least there're no serious bugs (although the "Operation canceled" message is misleading). Thanks a lot for the file.

atomek

422 post(s)
#07-Apr-15 09:21

ah,

I hope this new lead will be somehow helpful for the development of Manifold

tjhb
10,094 post(s)
#26-Mar-15 22:48

...I've got some way towards finding out why orthogonalization is necessary...

This is probably not a pressing issue now, but the answer is pretty much as expected.

In many cases, vertices of lines in [Paths] are not exactly coincident with vertices in [Network]. In those cases, IntersectLine(path line, network segment) "misses": the intersection points are not at the ends of the network segment, as the code assumes in order to compare directions. We always get some intersection(s), but in unpredictable location(s).

Orthogonalization of both drawings generally fixes that, correctly on the sample data I've checked. To know whether it works correctly in absolutely all cases, we would need to check the actual data, to make sure that all vertices in [Paths] lay on lines in [Network] after orthogonalization.

Pre-orthogonalized examples are attached. The distances between vertices here are about 0.9mm (1) and 5mm (2).

Attachments:
1.png
2.png

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.