You are here: Symbol Reference > Dew Namespace > Dew.Math Namespace > Classes > TMtxVecBase Class > TMtxVecBase Methods > BlockInit Method > TMtxVecBase.BlockInit Method ()
Dew Math for .NET
ContentsIndexHome
Example

Normal vectorized procedure: 

 

procedure ParetoPDF(X: TVec; a, b: double; Res: TVec); overload; begin Res.Size(X); Res.Power(x,-(a+1)); Res.Mul(Power(b,a)*a);; end;

 

Vectorized and blocked version of the Pareto probability distribution procedure: 

 

procedure ParetoPDF(X: TVec; a, b: double; Res: TVec); overload; begin Res.Size(X); Res.BlockInit; X.BlockInit; while not X.BlockEnd do begin Res.Power(x,-(a+1)); Res.Mul(Power(b,a)*a); Res.BlockNext; X.BlockNext; end; end;

 

Alternative: 

 

procedure ParetoPDF(X: TVec; a, b: double; Res: TVec); overload; var x1: Vector; begin Res.Size(X); Res.BlockInit; X1.BlockInit(X); while not X1.BlockEnd do begin Res.Power(x,-(a+1)); Res.Mul(Power(b,a)*a); Res.BlockNext; X1.BlockNext; end; end;

 

The block version of the ParetoPDF will execute faster then the non-blocked version in cases where X contains 5000-10000 elements or more (double precision). Below that value the two versions will perform about the same, except for very short vector sizes (below 50 elements), where the non-blocked version will have a slight advantage, because of the absence of block processing methods overhead. The time is saved between the calls to Res.Power(x,-(a+1)) and Res.Mul(Power(b,a)*a), where the same memory (stored in Res vector) is accesed in two consecutive calls. That memory is loaded in the CPU cache on the first call, if the Length of the Res vector is short enough to fit in. As an excercise you can also try to compare the performance of the vectorized and blocked version of the function with the single value version (ParetoPDF(X: double; a, b: double; Res: double) and measure the execution time of both versions for long vectors (100 000 elements) and short vectors (10 elements). 

The differences with block processing strongly depend upon the fatness of individual functions. When the relative number of memory accesses is high in compare to amount of computation, block processing with thrive.

Copyright (c) 1999-2024 by Dew Research. All rights reserved.