Subscribe to this thread
Home - General / All posts - Advice on threads
artlembo


3,299 post(s)
#29-Nov-22 23:01

I'm working with a 37GB dataset of about 200 million points. I have an empty geom field, that I need to update. Before doing things over and over, I wonder if anyone knows if using THREADS helps an UPDATE statement (part of me thinks it is I/O bound because it has to write the table). But, another part of me thinks it is embarrassingly parallel because each record is independent of one another.

The base query would look like this:

UPDATE TABLE [mergedtable]

SET geom = GeomMakePoint(VectorMakeX2(lon, lat));

but, would using THREADS make it faster?

UPDATE (

SELECT [mfd_id], [geom],

(GeomMakePoint(VectorMakeX2(lon,lat))) AS [geom New]

FROM CALL Selection([mergedtable], TRUE) THREADS SystemCpuCount()

) SET [geom] = [geom New];

I'm not being lazy, it just take a long time (obviously), so before running the second query as a test, I wondered if anyone had thoughts on it.

artlembo


3,299 post(s)
#30-Nov-22 00:48

wow, and how!!!

I decided to do it on a smaller dataset of 37 million points using my Microsoft Surface (only 4 cores). The first query ran in 985 seconds. The second query ran in 253 seconds!! Can't believe I'm saying this, but I can't wait to go to work tomorrow to try it out on the bigger dataset.

BTW, sorry, but for some reason, the code format isn't working. But, here is an update of the SQL:

UPDATE (

SELECT [mfd_id], [geom],

(GeomMakePoint(VectorMakeX2(lon,lat))) AS [geom New]

FROM [mergetable] THREADS SystemCpuCount()

) SET [geom] = [geom New];

danb


1,982 post(s)
#30-Nov-22 23:25

Nice!

I would love some instruction on how to effectively use threads. I tend to just leave it to Manifold but have found on occasion, either a significant speedup through their use and even through modifying the code not to use multiple threads.

I'd be interested to hear where you get to.


Landsystems Ltd ... Know your land | www.landsystems.co.nz

adamw


10,281 post(s)
#01-Dec-22 13:52

Threads specified by THREADS (we also allow controlling threads in the XxxPar functions via the last parameter) mostly help when the result fields of the SELECT include some computations. There are other scenarios where THREADS might help, but that's the main one.

(To be honest, I am a little surprised to see that GeomMakePoint(VectorMakeX2(..)) is apparently heavy enough for THREADS to have a big effect: ~40k records updated per second -> ~160k per second, that's big. But I guess it's possible. If we were talking about the query in the first post that updates Selection(...) I'd theorize that maybe we are seeing multiple threads also help with record lookup -- the table exposed by Selection(...) is virtual = lookup takes some time = threads help because record lookup runs in parallel with everything else and becomes free, to an extent. But [mergetable] looks like a static table. Although it could be a query or perhaps just very big, so the points on THREADS helping because record lookup is noticeable stand.)

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.