Home DEVELOPER C++ Programming Language: Extensions for Inclusive Scan in C++26

C++ Programming Language: Extensions for Inclusive Scan in C++26

0


Advertisement


Inclusive scanning solves problems related to range queries, such as calculating the sum of a series of elements in an array. It is also used in minimum range queries and various other algorithms.




Rainer Grimm has been working as a software architect, team and training manager for many years. He enjoys writing articles on the programming languages ​​C++, Python, and Haskell, but also frequently speaks at expert conferences. On his blog Modern C++ he discusses his passion C++ in depth.



Let’s accomplish some great things together: From December 1st to 24th, if you book one of my mentorship programs, I will donate half the money to ALS research.

This is the scam that empties bank accounts in Catalonia

I’ll post an update every week so we can see what we’ve accomplished so far.

Here you can find more information about it MeThe structure My Consulting Programs and Personal Programs:

If you or your team would like a special selection from my consulting program, please contact me at rainer.grimm@ModernesCpp.de. We will find a solution.

Here is a statement from ALS Outpatient Clinic,

ALS is one of the “orphans of medicine” – rare diseases that do not receive society’s attention. There is a lack of treatment and currently not enough funding for ALS research. Donations and other third-party funds are of fundamental importance.

This happened due to underfunding ice bucket challenge Obviously.

More money is needed for research.

Before I discuss asynchronous inclusive scanning, I would like to introduce inclusive scanning, also known as prefix summation.

In english language wikipedia The following definition can be found: “In computer science, prefix sum, cumulative sum, inclusive scan, or simply scan of the sequence of numbers x0x1x2…is the second order of numbers y0this1this2…, the sum of the prefixes (running total) of the input sequence:”

    y0 = x0
    y1 = x0 + x1
    y2 = x0 + x1+ x2
    ...

The proposal P2300R10 offers the implementation of inclusive scan.

using namespace std::execution;

sender auto async_inclusive_scan(scheduler auto sch,                          // 2
                                 std::span input,               // 1
                                 std::span output,                    // 1
                                 double init,                                 // 1
                                 std::size_t tile_count)                      // 3
{
  std::size_t const tile_size = (input.size() + tile_count - 1) / tile_count;

  std::vector partials(tile_count + 1);                               // 4
  partials(0) = init;                                                         // 4

  return just(std::move(partials))                                            // 5
       | continues_on(sch)
       | bulk(tile_count,                                                     // 6
           ( = )(std::size_t i, std::vector& partials) {              // 7
             auto start = i * tile_size;                                      // 8
             auto end   = std::min(input.size(), (i + 1) * tile_size);        // 8
             partials(i + 1) = *--std::inclusive_scan(begin(input) + start,   // 9
                                                      begin(input) + end,     // 9
                                                      begin(output) + start); // 9
           })                                                                 // 10
       | then(                                                                // 11
           ()(std::vector&& partials) {
             std::inclusive_scan(begin(partials), end(partials),              // 12
                                 begin(partials));                            // 12
             return std::move(partials);                                      // 13
           })
       | bulk(tile_count,                                                     // 14
           ( = )(std::size_t i, std::vector& partials) {              // 14
             auto start = i * tile_size;                                      // 14
             auto end   = std::min(input.size(), (i + 1) * tile_size);        // 14
             std::for_each(begin(output) + start, begin(output) + end,        // 14
               (&) (double& e) { e = partials(i) + e; }                       // 14
             );
           })
       | then(                                                                // 15
           ( = )(std::vector&& partials) {                            // 15
             return output;                                                   // 15
           });                                                                // 15
}

Here’s the explanation from the proposal P2300R10, Translated into German.

  1. Function scans a sequence doubles (denoted as std::span input) and saves the results in another order doubles (denoted as std::span output,
  2. A scheduler is used that specifies on which execution resource to start the scan.
  3. there will be one too tile_count-Parameters are used that control the number of executable agents created.
  4. First we need to allocate the temporary memory required for the algorithm, which we do with a std::vector, partials Complete. We need double temporary storage for each execution agent we create.
  5. Next, we will create our first transmitter execution::just And execution::continues_onThese transmitters broadcast the temporary memory that we have transferred to the transmitter. The transmitter is scheduled to be completed schwhich means the next element in the series sch used.
  6. transmitter and transmitter support structure through adapter operator|Similar to C++ fields. we use operator |To attach the next part of the task, this tile_count-Using execution agents execution::bulkArose.
  7. each agent calls one std::invocable Stands up and gives him two arguments. The first is the index of the agent (i) In execution::bulkoperation, in this case a unique integer (0, tile_countThe second argument is how the input sender sent it – temporary storage.
  8. We start by calculating the beginning and end of the range of input and output elements for which this agent is responsible, based on our agent index.
  9. Then we do a sorting task std::inclusive_scan Through our elements. We save the scan result for our last element, which is the sum of all our elements, into our temporary storage.partials,
  10. After all calculations in this first pass are completed in bulk, each created execution agent writes the sum of its elements into the subset in its slot.
  11. Now we need to scan all the values ​​in the subset. We do this with a single execution agent that runs after completion execution::bulk is executed. We create this execution agent execution::then,
  12. execution::then Takes one input transmitter and one std::invocable and calls it std::invocable With the value sent by the input sender. within us std::invocable we shout std::inclusive_scan For the subset that the input sender will send us.
  13. We then return the subsets that are needed in the next step.
  14. After all we run another execution::bulk In the same form as before. In this execution::bulk We use the values ​​scanned in partial regions to integrate the sums of other tiles across their elements and complete the inclusive scan.
  15. async_inclusive_scan Returns a dispatcher that outputs std::span Sends. The consumer of the algorithm can use the scan results to perform a series of additional tasks. the time when async_inclusive_scan Has been returned, calculation not yet completed. In fact, it may not have even started yet.

transmitter

  • just(values): Returns a sender without a full scheduler that sends the provided values. just There is a transmitter factory.
  • bulk(input, shape, call): Returns a dispatcher that represents the callable. call describes who input according to shape It is said.
  • continues_on(input, scheduler): Returns a sender that transitions from the sender’s execution agent to the destination’s execution agent for input.schedulerDescribes.
  • then(input, call), then Returns a sender that continues the sender’s function for input on additional nodes after calls to the given function. call Describes.

Wouldn’t it be great to see this program implemented? Currently (December 2024) no compiler supports std::execution or concepts sender And scheduler,

Reference implementation helps here stdexecFrom where I set the data type of the processed elements double In int Changed:

// inclusiveScanExecution.cpp

#include 
#include 
#include 
#include 
#include 
#include 
#include 

auto async_inclusive_scan(auto sch,                   // 2
                          std::span input, // 1
                          std::span output,      // 1
                          int init,                   // 1
                          std::size_t tile_count)     // 3
{
  std::size_t const tile_size = (input.size() + tile_count - 1) / tile_count;

  std::vector partials(tile_count + 1); // 4
  partials(0) = init;                        // 4

  return stdexec::just(std::move(partials)) // 5
         | stdexec::continues_on(sch) |
         stdexec::bulk(tile_count,                                      // 6
                       (=)(std::size_t i, std::vector &partials) { // 7
                         auto start = i * tile_size;                    // 8
                         auto end =
                             std::min(input.size(), (i + 1) * tile_size); // 8
                         partials(i + 1) =
                             *--std::inclusive_scan(begin(input) + start,   // 9
                                                    begin(input) + end,     // 9
                                                    begin(output) + start); // 9
                       }) // 10
         | stdexec::then( // 11
               ()(std::vector &&partials) {
                 std::inclusive_scan(begin(partials), end(partials), // 12
                                     begin(partials));               // 12
                 return std::move(partials);                         // 13
               }) |
         stdexec::bulk(
             tile_count,                                                 // 14
             (=)(std::size_t i, std::vector &partials) {            // 14
               auto start = i * tile_size;                               // 14
               auto end = std::min(input.size(), (i + 1) * tile_size);   // 14
               std::for_each(begin(output) + start, begin(output) + end, // 14
                             (&)(int &e) { e = partials(i) + e; }        // 14
               );
             }) |
         stdexec::then(                         // 15
             (=)(std::vector &&partials) { // 15
               return output;                   // 15
             });                                // 15
}

int main() {

  std::cout << '\n';

  std::vector input(30);
  std::iota(begin(input), end(input), 0);
  for (auto e : input) {
    std::cout << e << ' ';
  }
  std::cout << '\n';

  std::vector output(input.size());

  exec::static_thread_pool pool(8);
  auto sch = pool.get_scheduler();

  auto (out) =
      stdexec::sync_wait(async_inclusive_scan(sch, input, output, 0, 4))
          .value();

  for (auto e : out) {
    std::cout << e << ' ';
  }

  std::cout << '\n';
}

In main there will be a std::vector Input created with 30 elements. std::iota-fills function inputVector of consecutive integers starting with – 0The program then prints the contents of the vector to the console.

next one will be std::vector output with the same size input-Vector created to store the results of inclusive scan operations. exec::static_thread_pool pool It has threads that are used to execute tasks concurrently. get_scheduler-The thread pool member function creates a scheduler object. sch,

Celebration async_inclusive_scan uses the scheduler schvector inputvector outputinitial value of 0 and count a tile 4This function performs an inclusive scan operation asynchronously using the specified scheduler and returns a future-like object. Celebration stdexec::sync_wait Waits synchronously for the completion of async_inclusive_scanoperation, and the result is put into a variable out Opened it.

Finally the program returns the contents of the vector out In console:



In my next blog post I’ll take a step back and look at composing stations using Operator | explain.


(rme)

Kanban Days: Keynote on Customer Experience and Effective Leadership at Ford

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version