TMtxForLoop.BlockGranularity Property

Block granularity for TThreadingMode.tmForLoop.

Pascal

property BlockGranularity: integer;

Use this property to address asymetric multi-processing.

By default the for-loop range will be split equally among all available threads. Within each thread a further split of the assigned indexed range can be specified with BlockGranularity. When this parameter is 1, each thread will call the compute event/callback once, when it is 2, then twice, etc... of course with the adjusted index parameters in the callback event.

This is usefull when:

The load can vary depending on the values to be processed.
The threads are running on CPU architectures with particulary large turbo frequencies for individual cores.
The threads are running on asymetric CPU architectures like Intel Alder Lake, where not all CPU cores are equally fast (P+E).
It is not possible to completely accurately predict the amount of computation to be done within the threads.

If some threads will finish faster than others, they can start to work on other unfinished sections without waiting for the slowest thread to finish. The drawback is that some threads may never get to be launched at all, because the job will be finished, before each thread will be able to get their piece of it and other threads will process multiple pieces. Do not assume that all "ThreadIndex" values in the callback event will be used, when BlockGranularity is specified to be more than 1.

An alternative solution to this parameter is to specify ThreadCount, which is 2x, 4x, etc... bigger than CPU core count. Increasing ThreadCount beyond CPU core count however is not recommended. Running more threads will also put additional pressure on any shared resources protected with critical sections like memory managers and MtxVec object cache.

This parameter applies only to:

crefOnForLoopRange
crefOnForLoopRangeFun
crefOnForLoopRangeAnn When this value is bigger than 1, a "greedy" job distribution approach is used to distribute work among threads. The default value is 1 (disabled). The value of 4 allows 4x shorter average job running time in compare to 1 with smaller spread of running times and is recommended for real-time processing of audio and video.

Group

public

Links

TMtxForLoop Class, TMtxForLoop Members, MtxForLoop Namespace, public

What do you think about this topic? Send feedback!