Advertisement
Inclusive scanning solves problems related to range queries, such as calculating the sum of a series of elements in an array. It is also used in minimum range queries and various other algorithms.
Rainer Grimm has been working as a software architect, team and training manager for many years. He enjoys writing articles on the programming languages ​​C++, Python, and Haskell, but also frequently speaks at expert conferences. On his blog Modern C++ he discusses his passion C++ in depth.
christmas special
make the difference
Let’s accomplish some great things together: From December 1st to 24th, if you book one of my mentorship programs, I will donate half the money to ALS research.

I’ll post an update every week so we can see what we’ve accomplished so far.
Here you can find more information about it MeThe structure My Consulting Programs and Personal Programs:
If you or your team would like a special selection from my consulting program, please contact me at rainer.grimm@ModernesCpp.de. We will find a solution.
Why?
Here is a statement from ALS Outpatient Clinic,
ALS is one of the “orphans of medicine” – rare diseases that do not receive society’s attention. There is a lack of treatment and currently not enough funding for ALS research. Donations and other third-party funds are of fundamental importance.
This happened due to underfunding ice bucket challenge Obviously.
More money is needed for research.
prefix addition
Before I discuss asynchronous inclusive scanning, I would like to introduce inclusive scanning, also known as prefix summation.
In english language wikipedia The following definition can be found: “In computer science, prefix sum, cumulative sum, inclusive scan, or simply scan of the sequence of numbers x0x1x2…is the second order of numbers y0this1this2…, the sum of the prefixes (running total) of the input sequence:”
y0 = x0
y1 = x0 + x1
y2 = x0 + x1+ x2
...
The proposal P2300R10 offers the implementation of inclusive scan.
using namespace std::execution;
sender auto async_inclusive_scan(scheduler auto sch, // 2
std::span input, // 1
std::span output, // 1
double init, // 1
std::size_t tile_count) // 3
{
std::size_t const tile_size = (input.size() + tile_count - 1) / tile_count;
std::vector partials(tile_count + 1); // 4
partials(0) = init; // 4
return just(std::move(partials)) // 5
| continues_on(sch)
| bulk(tile_count, // 6
( = )(std::size_t i, std::vector& partials) { // 7
auto start = i * tile_size; // 8
auto end = std::min(input.size(), (i + 1) * tile_size); // 8
partials(i + 1) = *--std::inclusive_scan(begin(input) + start, // 9
begin(input) + end, // 9
begin(output) + start); // 9
}) // 10
| then( // 11
()(std::vector&& partials) {
std::inclusive_scan(begin(partials), end(partials), // 12
begin(partials)); // 12
return std::move(partials); // 13
})
| bulk(tile_count, // 14
( = )(std::size_t i, std::vector& partials) { // 14
auto start = i * tile_size; // 14
auto end = std::min(input.size(), (i + 1) * tile_size); // 14
std::for_each(begin(output) + start, begin(output) + end, // 14
(&) (double& e) { e = partials(i) + e; } // 14
);
})
| then( // 15
( = )(std::vector&& partials) { // 15
return output; // 15
}); // 15
}
Here’s the explanation from the proposal P2300R10, Translated into German.
- Function scans a sequence
double
s (denoted asstd::span
) and saves the results in another orderinput double
s (denoted asstd::span
,output - A scheduler is used that specifies on which execution resource to start the scan.
- there will be one too
tile_count
-Parameters are used that control the number of executable agents created. - First we need to allocate the temporary memory required for the algorithm, which we do with a
std::vector, partials
Complete. We need double temporary storage for each execution agent we create. - Next, we will create our first transmitter
execution::just
Andexecution::continues_on
These transmitters broadcast the temporary memory that we have transferred to the transmitter. The transmitter is scheduled to be completedsch
which means the next element in the seriessch
used. - transmitter and transmitter support structure through adapter
operator|
Similar to C++ fields. we useoperator |
To attach the next part of the task, thistile_count
-Using execution agentsexecution::bulk
Arose. - each agent calls one
std::invocable
Stands up and gives him two arguments. The first is the index of the agent (i
) Inexecution::bulk
operation, in this case a unique integer (0, tile_count
The second argument is how the input sender sent it – temporary storage. - We start by calculating the beginning and end of the range of input and output elements for which this agent is responsible, based on our agent index.
- Then we do a sorting task
std::inclusive_scan
Through our elements. We save the scan result for our last element, which is the sum of all our elements, into our temporary storage.partials
, - After all calculations in this first pass are completed in bulk, each created execution agent writes the sum of its elements into the subset in its slot.
- Now we need to scan all the values ​​in the subset. We do this with a single execution agent that runs after completion
execution::bulk
is executed. We create this execution agentexecution::then
, execution::then
Takes one input transmitter and onestd::invocable
and calls itstd::invocable
With the value sent by the input sender. within usstd::invocable
we shoutstd::inclusive_scan
For the subset that the input sender will send us.- We then return the subsets that are needed in the next step.
- After all we run another
execution::bulk
In the same form as before. In thisexecution::bulk
We use the values ​​scanned in partial regions to integrate the sums of other tiles across their elements and complete the inclusive scan. async_inclusive_scan
Returns a dispatcher that outputsstd::span
Sends. The consumer of the algorithm can use the scan results to perform a series of additional tasks. the time whenasync_inclusive_scan
Has been returned, calculation not yet completed. In fact, it may not have even started yet.
transmitter
just(values)
: Returns a sender without a full scheduler that sends the provided values.just
There is a transmitter factory.bulk(input, shape, call)
: Returns a dispatcher that represents the callable.call
describes whoinput
according toshape
It is said.continues_on(input, scheduler)
: Returns a sender that transitions from the sender’s execution agent to the destination’s execution agent for input.scheduler
Describes.then(input, call)
,then
Returns a sender that continues the sender’s function for input on additional nodes after calls to the given function.call
Describes.
Wouldn’t it be great to see this program implemented? Currently (December 2024) no compiler supports std::execution
or concepts sender
And scheduler
,
Reference implementation helps here stdexecFrom where I set the data type of the processed elements double
In int
Changed:
// inclusiveScanExecution.cpp
#include
#include
#include
#include
#include
#include
#include
auto async_inclusive_scan(auto sch, // 2
std::span input, // 1
std::span output, // 1
int init, // 1
std::size_t tile_count) // 3
{
std::size_t const tile_size = (input.size() + tile_count - 1) / tile_count;
std::vector partials(tile_count + 1); // 4
partials(0) = init; // 4
return stdexec::just(std::move(partials)) // 5
| stdexec::continues_on(sch) |
stdexec::bulk(tile_count, // 6
(=)(std::size_t i, std::vector &partials) { // 7
auto start = i * tile_size; // 8
auto end =
std::min(input.size(), (i + 1) * tile_size); // 8
partials(i + 1) =
*--std::inclusive_scan(begin(input) + start, // 9
begin(input) + end, // 9
begin(output) + start); // 9
}) // 10
| stdexec::then( // 11
()(std::vector &&partials) {
std::inclusive_scan(begin(partials), end(partials), // 12
begin(partials)); // 12
return std::move(partials); // 13
}) |
stdexec::bulk(
tile_count, // 14
(=)(std::size_t i, std::vector &partials) { // 14
auto start = i * tile_size; // 14
auto end = std::min(input.size(), (i + 1) * tile_size); // 14
std::for_each(begin(output) + start, begin(output) + end, // 14
(&)(int &e) { e = partials(i) + e; } // 14
);
}) |
stdexec::then( // 15
(=)(std::vector &&partials) { // 15
return output; // 15
}); // 15
}
int main() {
std::cout << '\n';
std::vector input(30);
std::iota(begin(input), end(input), 0);
for (auto e : input) {
std::cout << e << ' ';
}
std::cout << '\n';
std::vector output(input.size());
exec::static_thread_pool pool(8);
auto sch = pool.get_scheduler();
auto (out) =
stdexec::sync_wait(async_inclusive_scan(sch, input, output, 0, 4))
.value();
for (auto e : out) {
std::cout << e << ' ';
}
std::cout << '\n';
}
In main
there will be a std::vector
Input created with 30 elements. std::iota
-fills function input
Vector of consecutive integers starting with – 0
The program then prints the contents of the vector to the console.
next one will be std::vector
with the same size input
-Vector created to store the results of inclusive scan operations. exec::static_thread_pool pool
It has threads that are used to execute tasks concurrently. get_scheduler
-The thread pool member function creates a scheduler object. sch
,
Celebration async_inclusive_scan
uses the scheduler sch
vector input
vector output
initial value of 0
and count a tile 4
This function performs an inclusive scan operation asynchronously using the specified scheduler and returns a future-like object. Celebration stdexec::sync_wait
Waits synchronously for the completion of async_inclusive_scan
operation, and the result is put into a variable out
Opened it.
Finally the program returns the contents of the vector out
In console:
What will happen next?
In my next blog post I’ll take a step back and look at composing stations using Operator |
explain.
(rme)
