Subscribe to this thread
Home - General / All posts - in search of an idea to parallelize a process
apo
185 post(s)
#26-Jan-25 15:57

Hi all,

As I said, I'm looking for an idea.

The aim of my process is to prepare a DEM and a mask to be sent to an external process and take the result back to be finalized.

The external process is memory-intensive and requires the DEM to be broken down into sub-components. This requires an iteration. The adopted process mixes a ground SQL approach with C# function calls for specific tasks (iteration, import, export and powershell calls). The overall process comprises the following steps: (Q = SQL query & S = script & ES = external script)

  • Q1 request to launch the procedure defining the zone to be covered Q and calling the S1 iteration script
  • S1 preparation of necessary components (to meet EXECUTE requirements) and iterative call of mainquery Q2
  • Q2 main query with folowing steps

  1. extraction of the part of the global DEM to be processed
  2. resampling of the DEM at the right resolution
  3. export DEM file (call S2 script)
  4. mask creation for areas to be processedapply mask values (1/0)
  5. export mask file (call S2)
  6. call external powershell command and read messages to log (call ES1)
  7. import result (call S3 script) and inject into component (table / image)
  8. extract internal tiles (remove edge effects)
  9. export final file (call S2)

The process works perfectly and the induced time is mainly related to the external process using only 1 CPU.

My next idea is to make several parallel calls to this external process, but I'm faced with the problem that I have to work in the active database, which would imply conflicts with parallel processes.

One possibility would be to rename the temp components using the code of the parallel thread. Perhaps the notion of a temporary database could allow three processes to be parallelized, but that's where my understanding of EXECUTE in another, previously defined temporary database ends.

Any experience would be greatly appreciate

a.

apo
185 post(s)
#28-Jan-25 19:30

Few more information about the process and more infos on tests.

The main goal of the process is to generate rocks as shown on the joined subset of the map under production.

The trees, screes and rock falls were already generated using M9.

Attachments:
map_subset.png

apo
185 post(s)
#28-Jan-25 19:36

The iteration Query Q1 coupled to the script S1 is as follows.

FUNCTION callloop(@startzoneid INT32, @endzoneid INT32, @extractlevel INT32) NVARCHAR AS SCRIPT [S LOOPEXTRACT] ENTRY 'Script.LoopExtract';

VALUE @looprslt NVARCHAR = callloop(2954,2955, 1);

The call simply inject the first and the last zone ids to be processed

The corresponding script iterates an EXECUTE query after having set a series of tables needed for the next EXECUTE

// C#

class Script

{

public static string LoopExtract(int startid, int endid, int lvl)

{

    int loopsnbr = endid - startid + 1;

    int currentzone = 0;

    Manifold.Application app = Manifold.Application;

    app.Log(loopsnbr.ToString());

    using (Manifold.Database db = Manifold.Application.GetDatabaseRoot())

    {

        // INIT TABLE AND IMG COMPONENTS

        string sql0 = "EXECUTE [Q INIT EXTRACTION] ;";

        app.Log(sql0);

        using (db.Run(sql0))

 

        {

          app.Log("init comp done");

        }

 

        for (int i = 0; i < loopsnbr; i++) 

        {

            currentzone = startid + i;

            string sql = "EXECUTE WITH (@param_zone INT32 = "+currentzone+", @param_lvl INT32 = "+lvl+") [Q EXTRACT ZONE] ;";

            app.Log(sql);

            using (db.Run(sql))

 

            {

              app.Log("loop "+i.ToString());

            }

          }

 

    }

        app.OpenLog();

        return "done in "+loopsnbr.ToString()+" loops.";

}

static Manifold.Context Manifold;

static void Main()

{

}

}

This part works fine

apo
185 post(s)
#28-Jan-25 19:44

In the main query Q2 all the steps are fine. I will only detail here the powershell call.

This call is based on a SQL function call linked to a Script ES1

FUNCTION runps(@command NVARCHAR) NVARCHAR AS SCRIPT [S PS] ENTRY 'Script.runps';

VALUE @p NVARCHAR = runps(CAST(@zone AS NVARCHAR));

the only parameter send is the id of the current zone for the external app to target the right dem and mask files

the script is

// C#

// $reference: system.dll 

        using System;

        using System.Diagnostics;

class Script

{

 

    public static string runps(string command)

        { 

 

        Manifold.Application app = Manifold.Application;

        var processStartInfo = new ProcessStartInfo();

        processStartInfo.WorkingDirectory = "C:\\ps";

        processStartInfo.FileName = "powershell.exe";

        processStartInfo.Arguments = "-Command \".\\ps.exe -l 15 -m c:\\ps\\dem\\mask_"+command+".grd c:\\ps\\dem\\dem_"+command+".grd\"";

        processStartInfo.UseShellExecute = false;

        processStartInfo.RedirectStandardOutput = true;

 

        var process = new Process();

        process.StartInfo = processStartInfo;

        process.Start();

        //string output = process.StandardOutput.ReadToEnd();

        string rslt = "none";

        while (!process.StandardOutput.EndOfStream)

        {

            string line = process.StandardOutput.ReadLine();

            if ( line.Contains("complete") == false)

            {

                 app.Log(line);

                if ( line.Contains("rock_hachures"))

                {

                     rslt = line.Replace("save_img: """);

                     rslt = rslt.Replace(" min 0.000000 max 1.000000""");

                    app.Log("filename:**"+rslt+"**");

                }

            }

 

        }

        return rslt;

 

        }

 

    static Manifold.Context Manifold;

    static void Main()

    {

 

    }

}

this part works fine also but the process will pass from one to the next one, taking a long time for complex dem structures. Which is why I wanted to parallelize this call.

apo
185 post(s)
#28-Jan-25 19:53

My first parallel tests are using a SELECT approach to call several records in parallel. It asked me to define the THREADS number to the number of parallel calls I wanted (4) and set the BATCH to 1 in order to force it to use the 4 slots in parallel.

This works fine for the call, but ask for the temporary tables (extracted dem, resampled dem, imported raw image, etc.) to be processed using each

  • a temporary database or
  • different temporary table names

to avoid conflicts.

My next step is to try the use of temporary database in parralel

oeaulong

540 post(s)
#01-Feb-25 22:01

Thanks for the code. some nice examples.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.